Storing large collections in Riak (or any distributed store)

Jeremiah Peschka jeremiah.peschka at gmail.com
Wed Feb 9 09:25:22 EST 2011


So, if I understand this correctly, you want to send out an email to a bunch
of users on a list. Each of these users can also have an arbitrary number of
attributes. In order to send the email, you'll need to retrieve both the
email address AND the user's attributes.

Listing keys is a slow operation in Riak: you have to talk to every node in
the cluster to find out which keys you need. At first I thought you might be
able to store list members in one bucket's key/value pairs, but that might
make list management somewhat painful (adding a new person to a list
requires loading the list from disk, modifying it in your application code,
and then saving the list). In summation, I don't know how to model your
data. I suspect there is a sneaky way that would do a great job of it.

You could definitely use links to create a linked list. Link maintenance
might be a bit of a chore if you're adding and removing people from the list
frequently. You may want to look at using Redis as your in memory system to
work with your data structures and persist them to Riak on a regular basis.
Redis has much strong support for lists and sets, which it sounds like you
might need.

Jeremiah Peschka
Microsoft SQL Server MVP
MCITP: Database Developer, DBA


On Tue, Feb 8, 2011 at 4:58 PM, Nathan Sobo <nsobo at pivotallabs.com> wrote:

> I've never used a data store like Riak, but I'm working with a client who
> wants to store a large number of large mailing lists. Each list is
> potentially a few million entries long, with each list entry consisting of
> an email address plus arbitrary key-value pairs. When a customer wants to
> send out a blast email, I need to send an email to every single entry on one
> of their mailing lists. How do I model this in Riak?
>
> One thought: I could store a giant list of keys. When I want to send out
> email, I retrieve this list, then iterate through it, retrieving the
> corresponding entries one at a time.
> Another thought: Each entry references the next entry in the list, like a
> linked list. This sounds like a bad idea though.
> Half-formed thought: Could I use the Riak links feature for this? It seems
> like millions of links is probably an abuse of that feature though.
>
> Is a key-value store actually inappropriate for this problem? Would I be
> better suited using a store that allows efficient range scanning (like
> BigTable), so that I could cluster the mailing list entries naturally into
> lists?
>
> Any guidance would be greatly appreciated!
>
> Thank you,
> Nathan Sobo
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110209/c87af1be/attachment.html>


More information about the riak-users mailing list