Is Riak suitable for s small-record write-intensive billion-records application?

Dmitry Demeshchuk demeshchuk at gmail.com
Thu Oct 18 08:00:50 EDT 2012


That's quite a short requirements spec, but here are some thoughts and
facts:

- Riak can carry such data amount fairly easily in terms of puts and gets.
The main advantage here is scalability: you are too loaded by requests/data
amount – you just add several extra machines to the cluster. If you are
using cloud solutions, this may take, like, minutes, plus time for data to
be re-distributed.

- I'm not sure if Riak alone can handle additional indexing of large data
amounts very well, that probably depends on what and how you index. Basho
hasn't been working on search all the time, so Riak is quite new in this
field. Still, riak_search is a live product, so far I know being used in
production by some companies (don't know the exact names and numbers
though, sorry). Some companies use external indexing; for instance, Echo (
http://aboutecho.com/ ) is using Postgres for this purpose, and their
amount of data stored in Riak is ridiculously large.

- It's not clear how critical is consistency for you. Riak is an eventually
consistent database, you know. However, so far I know, Sean Cribbs, one of
Basho engineers, has recently given a talk about consistency in Riak.
Haven't seen it yet, but from the description I got that he proposed some
interesting solutions for handling consistency in Riak. Also, Eric Brewer
(the CAP theorem creator) has written an article about NoSQL databases
being underrated in terms of consistency, and showed that even some
financial systems are using NoSQL nowdays (again, sorry, cannot find the
link, maybe someone else will help me out).


On Thu, Oct 18, 2012 at 3:42 PM, Yassen Damyanov <yassen.tis at gmail.com>wrote:

> Hi everyone,
>
> Absolutely new (and ignorant) to NoSQL solutions and to Riak (my
> apologies; but extensive experience with SQL RDBMS).
>
> We consider a NoSQL DB deployment for a mission-critical application
> where we need to store several hundreds of MILLIONS of data records,
> each record consisting of about 6 string fields, record total length
> is 160 bytes. There is a unique key in each record that seems suitable
> for hashing (20+ bytes string, e.g. "cle01_tpls01_2105328884").
>
> The application should be able to write several hundreds of new
> records per second, but first check if the unique key already exists.
> Writing is to be done only if it is not there. If it is, the app needs
> to retrieve the whole record and return it to the client and no
> writing is done in this case.
>
> I need to know if Riak would be suitable for such application. Please,
> advice, thanks!
>
> (Again, apologies for my ignorance. If we choose Riak, I promise to
> get educated ;)
>
> Yassen
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Best regards,
Dmitry Demeshchuk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121018/fba14006/attachment.html>


More information about the riak-users mailing list