Is Riak suitable for s small-record write-intensive billion-records application?

Yassen Damyanov yassen.tis at gmail.com
Mon Oct 22 03:40:26 EDT 2012


Thanks again everyone (and apologies for junking your mailbox with my
last posts).

So far, I do plan to use Riak for the task. Here's my current vision
on the application design:

The several nodes will be bound to a single "external" IP via CARP.
Thus at any given moment only one of the nodes is going to receive
external requests.

There is a front-end and a back-end part of the application, each of
them running at each node. The current  CARP-active node will receive
the request, do a local lock, check for the PK presence in a local
in-memory hash table (I plan using murmur 32-bit hash*). The check
will return negative result for 99% of the requests, so this will be a
super-fast step almost always. The node front-end then handles the
record to the back-end of any of the nodes in the cluster, doing
load-balancing.

If the hash table returns a positive result, the fron-end requests the
record from Riak and returns it back. (There is a possibility that the
record is not there in which case the front-end inserts it in the
database.) Then the front-end unlocks.

Any given node can be stopped or additional nodes can be added with
almost no interruption. If the active node is taken down, CARP will
appoint a new active node and its front-end will start accepting
requests replacing the gone node. New nodes will announce themselves
to the front-end apps via multicast.

Thus Riak will work on writes most of the time; very seldom for reads,
and the eventual consistency does not seem to bring issues due to the
central PK-checking.

Does this sound reasonable? What Riak back-end would you recommend for
this scenario? Your comments and suggestions are very much
appreciated.

Yassen

[*] murmur 32-bit, if suitable in terms of hash values distribution.
This means 4Gigs RAM for the table ... not awful lot.




More information about the riak-users mailing list