Secondary indexing

Jason J. W. Williams jasonjwwilliams at
Fri Apr 22 17:13:59 EDT 2011

We implemented a scheme for maintaining secondary indexes using
MapReduce and KeyFilters. It's written in Python but there's a
description of how the design works and a test script for validating
other implementations:

Nor sure if that will's reasonably performant and much
faster than regular Riak MR on the values.


On Fri, Apr 22, 2011 at 2:45 PM, Bryan O'Sullivan <bos at> wrote:
> In my application, I have a frequent need to store collections of data where
> I rarely access the items individually, but much more commonly in aggregate
> form. In principle, the mapreduce feature is well suited to this, but it
> must be fed a set of {bucket,key} pairs to consume, and maintaining these
> collections is painful.
> The naive approach would be to store each collection in its own bucket, and
> use listkeys to fetch the keys that I can then mapreduce over. However,
> listkeys has red flags all over it, presumably because it's implemented as a
> scan+filter of all buckets and keys.
> The slightly less naive approach would be to store the keys for a collection
> in a list. Unfortunately, for a collection of even modest size (10,000
> elements), the cost of retrieving and storing this index to make small
> changes (e.g. "I'm adding one element to the collection") rapidly becomes
> untenable, and that's before such knotty issues as "how do I delete items
> from this index?" arise. This is essentially the approach taken by the
> riak_link_index project.
> A more subtle approach would be to maintain a hierarchical index, with a
> "current" index that is modified whenever items are added, and promoted to
> "frozen" once it reaches a certain size. Frozen indices would be maintained
> in an index of their own, which itself would be frozen, and so on.
> Performance would be maintained by limiting the maximum size of each index
> object. This is a similar scheme to the traditional filesystem approach of
> direct, single indirect, and double indirect block indexing. It's also quite
> a lot of work to implement.
> Does Basho have any plans to address this? It seems likely to be a very
> frequently arising need.
> _______________________________________________
> riak-users mailing list
> riak-users at

More information about the riak-users mailing list