High volume data series storage and queries
jeremiah.peschka at gmail.com
Tue Aug 9 11:14:13 EDT 2011
Excellent points, Alex.
When you compare Riak's storage overhead to something like an RDBMS where you have maybe 24 bytes for row overhead (as is the case for PostgreSQL), you'll find that there's a tremendous economy of space elsewhere.
Riak is going to excel in applications where reads and writes will be truly random. Yes, there are Map Reduce features, but randomly organized data is still randomly organized data.
If you look at RDBMSes, horizontally partitioning your system through RDBMS features (SQL Server's partitioned tables, PostgreSQL's partitioned views, for example), gives you the ability to take advantage of many known quantities in that world - range queries can take advantage of sequential scan speeds across rotational disks.
You can even avoid the overhead of a traditional RDBMS altogether and use the InnoDB APIs or something like HandlerSocket to write data very quickly.
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP
On Aug 9, 2011, at 7:43 AM, Alexander Sicular wrote:
> A couple of thoughts:
> -disk io
> -total keys versus memory
> -data on disk overhead
> As Jeremiah noted, disk io is crucial. Thankfully, Riak's distributed mesh gives you access to a number of spindles limited only by your budget. I think that is a critical bonus of a distributed system like Riak that is often not fully appreciated. Here Riak is a win for you.
> Bitcask needs all keys to fit in memory. We are talking something like:
> (key length + overhead) * number of keys * replicas < cluster max available ram.
> There is a tool on the wiki which should help figure this out. What that basically means for you is that you will have to batch your data by some sensor/time granularity metric. Let's say every minute. At 10hz that is a 600x reduction in total keys. Of course, this doesn't come for free. Your application middleware will have to accommodate. That means you could lose up to whatever your time granularity batch is. Ie. You could lose a minute of sensor data should your application fail. Here Riak is neutral to negative.
> Riak data structure is not friendly towards small values. Sensor data generally spit out integers or other small data tuples. If you search the list archives you will find a magnificent data overhead writeup. IIRC, it was something on the order of 450b. What that basically tells you is that you can't use bitcask for small values if disk space is a concern, as I imagine it to be in this case. Also, sensor data is generally write only, ie. never deleted or modified, so compaction should not be a concern when using bitcask. Here Riak is a strong negative.
> Data retrieval issues aside (which between Riak Search/secondary indexes/third party indexes should not be a major concern), I am of the opinion that Riak is not a good fit for high frequency sensor data applications.
> Sent from my rotary phone.
> On Aug 8, 2011 9:40 PM, "Paul O" <pcotec at gmail.com> wrote:
> > Quite a few interesting points, thanks!
> > On Mon, Aug 8, 2011 at 5:53 PM, Jeremiah Peschka <jeremiah.peschka at gmail.com
> >> wrote:
> >> Responses inline
> >> On Aug 8, 2011, at 1:25 PM, Paul O wrote:
> >> Will any existing data be imported? If this is totally greenfield, then
> >> you're free to do whatever zany things you want!
> > Almost totally greenfield, yes. Some data will need to be imported but it's
> > already in the format described.
> > Ah, so you need IOPS throughput, not storage capacity. On the hardware side
> >> make sure your storage subsystem can keep up - don't cheap out on disks just
> >> because you have a lot of nodes. A single rotational HDD can only handle
> >> about 180 IOPS on average. There's a lot you can do on the storage backend
> >> to make sure you're able to keep up there.
> > Indeed, storage capacity is also an issue but IOPS would be important, too.
> > I assume that sending batches to Riak (opaque blobs) would help a lot with
> > the quantity of writes, but it's still a very important point.
> > You may want to look into ways to force Riak to clean up the bitcask files.
> >> I don't entirely remember how it's going to handle cleaning up deleted
> >> records, but you might run into some tricky situations where compactions
> >> aren't occurring.
> > Hm, any references regarding that? It would be a major snag in the whole
> > schema Riak doesn't properly reclaim space for deleted records.
> > Riak is pretty constant time for Bitcask. The tricky part with the amount of
> >> data you're describing is that Bitcask requires (I think) that all keys fit
> >> into memory. As your data volume increases, you'll need to do a combination
> >> of scaling up and scaling out. Scale up RAM in the nodes and then add
> >> additional nodes to handle load. RAM will help with data volume, more nodes
> >> will help with write throughput.
> > Indeed, for high frequency sources that would create lots of bundles even
> > the MaxN to 1 reduction for key names might still generate loads of keys.
> > Any idea how much RAM Riak requires per record, or a reference that would
> > point me to it?
> > Since you're searching on time series, mostly, you could build time indexes
> >> in your RDBMS. The nice thing is that querying temporal data is well
> >> documented in the relational world, especially in the data warehousing
> >> world. In your case, I'd create a dates table and have a foreign key
> >> relating to my RDBMS index table to make it easy to search for dates.
> >> Querying your time table will be fast which reduces the need for scans in
> >> your index table.
> >> EXAMPLE:
> >> CREATE TABLE timeseries (
> >> time_key INT,
> >> date TIMESTAMP,
> >> datestring VARCHAR(30),
> >> year SMALLINT,
> >> month TINYINT,
> >> day TINYINT,
> >> day_of_week TINYINT
> >> -- etc
> >> );
> >> CREATE TABLE riak_index (
> >> id INT NOT NULL,
> >> time_key INT NOT NULL REFERENCES timeseries(time_key),
> >> riak_key VARCHAR(100) NOT NULL
> >> );
> >> SELECT ri.riak_key
> >> FROM timeseries ts
> >> JOIN riak_index ri ON ts.time_key = ri.time_key
> >> WHERE ts.date BETWEEN '20090702' AND '20100702';
> > My plan was to have the riak_index contain something like: (id, start_time,
> > end_time, source_id, record_count.)
> > Without going too much into RDBMS fun, this pattern can get your RDBMS
> >> running pretty quickly and then you can combine that with Riak's performance
> >> and have a really good idea of how quick any query will be.
> > That's roughly the plan, thanks again for your contributions to the
> > discussion!
> > Paul
> riak-users mailing list
> riak-users at lists.basho.com
More information about the riak-users