Data modelling questions

AM ams.fwd at
Fri Feb 20 17:35:59 EST 2015

Hi All.

I am currently looking at using Riak as a data store for time series 
data. Currently we get about 1.5T of data in JSON format that I intend 
to persist in Riak. I am having some difficulty figuring out how to 
model it such that I can fulfill the use cases I have been handed.

The data is provided in several types of log formats with some common 

- timestamp
- geo
- s/w build #
- location #

- .... whole bunch of other key value pairs.

For the most part I will need to provide aggregated views based on geo. 
There are some views based on s/w build # and location #. The 
aggregation will be on an hourly basis.

The model that I came up with:

<log-format-type>[<hour>][<timestamp>-<msg-id>]: <json-body>

with indices on geo, s/w build # and location #.

I /think/ this will satisfy most of what I want to do, but I was 
wondering if someone else has had to solve this sort of a problem and 
what their solution was?

I would also be interested in hearing about alternate structures or bad 
assumptions I am making here.


