High volume data series storage and queries

Paul O pcotec at gmail.com
Tue Aug 9 21:44:33 EDT 2011


>
>    From what I've seen from your estimation, the data amount you're
> going to store is huge. Not only that but also the bandwidth required
> is quite a lot. (Assuming you have a 200MBit connection and you send
> data over UDP (128 bytes in total = headers + payload), after a simple
> calculation it results that you'll only be able to handle 16384
> sensors. Thus maybe you should reduce the readings.)
>

I need to give the other stakeholders an idea of the strategy and the costs
involved, hence the effort to make things predictable, even under higher
loads. I will make the assumption that the connection will be handled if the
whole equation makes sense.


>    I wouldn't store the "data files" inside the embedded DB, but the
> actual raw readings.


While this would be tempting how would you see something like that? A huge
db by source? As I said previously, in many cases this would involve a DB
with around 1 billion records. In fact many such DBs used all at the same
time and I can constrain the queries so as to never have to consider the
full amount of data. It seems to me that I'd be giving up a good chance for
optimization (as premature and evil be it) by storing all data points. And
if you're suggesting one DB per batch, the batches would be relatively
reduced in size, wouldn't that create its own set of problems for the DB
library (opening many files, closing them, etc.) when I can reduce the final
range query to a sequential traversal of at most 3 x MaxN records?

Regards,

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110809/9cc80351/attachment.html>


More information about the riak-users mailing list