Storing large collections in Riak (or any distributed store)

Nathan Sobo
Tue Feb 8 19:58:15 EST 2011

I've never used a data store like Riak, but I'm working with a client who
wants to store a large number of large mailing lists. Each list is
potentially a few million entries long, with each list entry consisting of
an email address plus arbitrary key-value pairs. When a customer wants to
send out a blast email, I need to send an email to every single entry on one
of their mailing lists. How do I model this in Riak?

One thought: I could store a giant list of keys. When I want to send out
email, I retrieve this list, then iterate through it, retrieving the
corresponding entries one at a time.
Another thought: Each entry references the next entry in the list, like a
linked list. This sounds like a bad idea though.
Half-formed thought: Could I use the Riak links feature for this? It seems
like millions of links is probably an abuse of that feature though.

Is a key-value store actually inappropriate for this problem? Would I be
better suited using a store that allows efficient range scanning (like
BigTable), so that I could cluster the mailing list entries naturally into

Any guidance would be greatly appreciated!

Thank you,
Nathan Sobo
