High volume data series storage and queries

Ciprian Dorin Craciun ciprian.craciun at gmail.com
Wed Aug 10 09:39:06 EDT 2011


On Wed, Aug 10, 2011 at 16:30, Jeremiah Peschka
<jeremiah.peschka at gmail.com> wrote:
> Ciprian's response this morning has some interesting information in it - I have suspicions about where the PostgreSQL and MySQL performance problems. Unfortunately, my suspicion also highlights why some people don't like using RDBMSes: when you get to higher performance scenarios you frequently need specialized knowledge to tune the RDBMS to run well.

    Indeed you need highly qualified DBA's to tune the RDBMS. And in
my case I've failed to successfully do so :). (I've asked here and
there for feedback, but nothing I have done "revived" the insert
speed.)


> Within an RDBMS, you'll get good performance by splitting out your writes into a partitioned table where you either group up writes by some constant value (modulus of sensor location, for example) or else you group writes by month. You can then spread your writes out across multiple storage devices, gaining some of the benefits of a system like Riak, and still getting the benefit of an RDBMSes sequential scans.
> ---
> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
> Microsoft SQL Server MVP

    But then the solution gets quite complicated and you'll end up
constructing a specialized "data-store" on-top of a cluster of RDBMS,
when you could have just started from the beginning on-top of some
lightweight storage engine.

    Ciprian.


> On Aug 9, 2011, at 6:38 PM, Paul O wrote:
>
>> Jeremiah, with all the feedback indicating Riak to not be such a straight fit for the problem even accepting batching compromises, I'll have to strongly reconsider RDBMs solutions, though they come with their own tradeoffs, as discussed here today. How about storing data using innostore instead of bitcask, would this be a "best of both worlds" situation?
>>
>> Thanks for these and the previous valuable comments, again,
>>
>> Paul
>>
>> On Tue, Aug 9, 2011 at 11:14 AM, Jeremiah Peschka <jeremiah.peschka at gmail.com> wrote:
>> Excellent points, Alex.
>>
>> When you compare Riak's storage overhead to something like an RDBMS where you have maybe 24 bytes for row overhead (as is the case for PostgreSQL), you'll find that there's a tremendous economy of space elsewhere.
>>
>> Riak is going to excel in applications where reads and writes will be truly random. Yes, there are Map Reduce features, but randomly organized data is still randomly organized data.
>>
>> If you look at RDBMSes, horizontally partitioning your system through RDBMS features (SQL Server's partitioned tables, PostgreSQL's partitioned views, for example), gives you the ability to take advantage of many known quantities in that world - range queries can take advantage of sequential scan speeds across rotational disks.
>>
>> You can even avoid the overhead of a traditional RDBMS altogether and use the InnoDB APIs or something like HandlerSocket to write data very quickly.
>>
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>>  Microsoft SQL Server MVP
>>
>> On Aug 9, 2011, at 7:43 AM, Alexander Sicular wrote:
>>
>> > A couple of thoughts:
>> >
>> > -disk io
>> > -total keys versus memory
>> > -data on disk overhead
>> >
>> > As Jeremiah noted, disk io is crucial. Thankfully, Riak's distributed mesh gives you access to a number of spindles limited only by your budget. I think that is a critical bonus of a distributed system like Riak that is often not fully appreciated. Here Riak is a win for you.
>> > Bitcask needs all keys to fit in memory. We are talking something like:
>> >
>> > (key length + overhead) * number of keys * replicas < cluster max available ram.
>> >
>> > There is a tool on the wiki which should help figure this out. What that basically means for you is that you will have to batch your data by some sensor/time granularity metric. Let's say every minute. At 10hz that is a 600x reduction in total keys. Of course, this doesn't come for free. Your application middleware will have to accommodate. That means you could lose up to whatever your time granularity batch is. Ie. You could lose a minute of sensor data should your application fail. Here Riak is neutral to negative.
>> > Riak data structure is not friendly towards small values. Sensor data generally spit out integers or other small data tuples. If you search the list archives you will find a magnificent data overhead writeup. IIRC, it was something on the order of 450b. What that basically tells you is that you can't use bitcask for small values if disk space is a concern, as I imagine it to be in this case. Also, sensor data is generally write only, ie. never deleted or modified, so compaction should not be a concern when using bitcask. Here Riak is a strong negative.
>> > Data retrieval issues aside (which between Riak Search/secondary indexes/third party indexes should not be a major concern), I am of the opinion that Riak is not a good fit for high frequency sensor data applications.
>> >
>> > Cheers,
>> > Alexander
>> >
>> > Sent from my rotary phone.
>> >
>> > On Aug 8, 2011 9:40 PM, "Paul O" <pcotec at gmail.com> wrote:
>> > > Quite a few interesting points, thanks!
>> > >
>> > > On Mon, Aug 8, 2011 at 5:53 PM, Jeremiah Peschka <jeremiah.peschka at gmail.com
>> > >> wrote:
>> > >
>> > >> Responses inline
>> > >>
>> > >> On Aug 8, 2011, at 1:25 PM, Paul O wrote:
>> > >>
>> > >> Will any existing data be imported? If this is totally greenfield, then
>> > >> you're free to do whatever zany things you want!
>> > >
>> > >
>> > > Almost totally greenfield, yes. Some data will need to be imported but it's
>> > > already in the format described.
>> > >
>> > > Ah, so you need IOPS throughput, not storage capacity. On the hardware side
>> > >> make sure your storage subsystem can keep up - don't cheap out on disks just
>> > >> because you have a lot of nodes. A single rotational HDD can only handle
>> > >> about 180 IOPS on average. There's a lot you can do on the storage backend
>> > >> to make sure you're able to keep up there.
>> > >>
>> > >
>> > > Indeed, storage capacity is also an issue but IOPS would be important, too.
>> > > I assume that sending batches to Riak (opaque blobs) would help a lot with
>> > > the quantity of writes, but it's still a very important point.
>> > >
>> > > You may want to look into ways to force Riak to clean up the bitcask files.
>> > >> I don't entirely remember how it's going to handle cleaning up deleted
>> > >> records, but you might run into some tricky situations where compactions
>> > >> aren't occurring.
>> > >>
>> > >
>> > > Hm, any references regarding that? It would be a major snag in the whole
>> > > schema Riak doesn't properly reclaim space for deleted records.
>> > >
>> > > Riak is pretty constant time for Bitcask. The tricky part with the amount of
>> > >> data you're describing is that Bitcask requires (I think) that all keys fit
>> > >> into memory. As your data volume increases, you'll need to do a combination
>> > >> of scaling up and scaling out. Scale up RAM in the nodes and then add
>> > >> additional nodes to handle load. RAM will help with data volume, more nodes
>> > >> will help with write throughput.
>> > >>
>> > >
>> > > Indeed, for high frequency sources that would create lots of bundles even
>> > > the MaxN to 1 reduction for key names might still generate loads of keys.
>> > > Any idea how much RAM Riak requires per record, or a reference that would
>> > > point me to it?
>> > >
>> > > Since you're searching on time series, mostly, you could build time indexes
>> > >> in your RDBMS. The nice thing is that querying temporal data is well
>> > >> documented in the relational world, especially in the data warehousing
>> > >> world. In your case, I'd create a dates table and have a foreign key
>> > >> relating to my RDBMS index table to make it easy to search for dates.
>> > >> Querying your time table will be fast which reduces the need for scans in
>> > >> your index table.
>> > >>
>> > >> EXAMPLE:
>> > >>
>> > >> CREATE TABLE timeseries (
>> > >> time_key INT,
>> > >> date TIMESTAMP,
>> > >> datestring VARCHAR(30),
>> > >> year SMALLINT,
>> > >> month TINYINT,
>> > >> day TINYINT,
>> > >> day_of_week TINYINT
>> > >> -- etc
>> > >> );
>> > >>
>> > >> CREATE TABLE riak_index (
>> > >> id INT NOT NULL,
>> > >> time_key INT NOT NULL REFERENCES timeseries(time_key),
>> > >> riak_key VARCHAR(100) NOT NULL
>> > >> );
>> > >>
>> > >>
>> > >> SELECT ri.riak_key
>> > >> FROM timeseries ts
>> > >> JOIN riak_index ri ON ts.time_key = ri.time_key
>> > >> WHERE ts.date BETWEEN '20090702' AND '20100702';
>> > >>
>> > >
>> > > My plan was to have the riak_index contain something like: (id, start_time,
>> > > end_time, source_id, record_count.)
>> > >
>> > > Without going too much into RDBMS fun, this pattern can get your RDBMS
>> > >> running pretty quickly and then you can combine that with Riak's performance
>> > >> and have a really good idea of how quick any query will be.
>> > >
>> > >
>> > > That's roughly the plan, thanks again for your contributions to the
>> > > discussion!
>> > >
>> > > Paul
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list