Experimental branch - 2i query improvements

Kresten Krab Thorup krab at trifork.com
Wed Apr 17 19:05:52 EDT 2013


Very interesting!

Regarding feature #1: I don't understand how an ets based index adds up to 400k per posting; as your write up suggests. Did you mean 400b? I thought ets was reasonably memory efficient.  Do you use very large keys?

Sent from my iPhone

On 16/04/2013, at 23.50, "Martin Sumner" <martin.sumner at adaptip.co.uk<mailto:martin.sumner at adaptip.co.uk>> wrote:

I've been working on an experimental branch to offer some improvements to the functionality and performance of 2i queries in Riak:
https://github.com/martinsumner/riak_kv

Explanation:
https://github.com/martinsumner/riak_kv/blob/master/docs/index_speedup.md

There are four basic features that are included:
1. The ability to pin particular 2i indexes into memory (without loss of consistency on restart of a node)
2. The ability to set partition-level static bloom filters for particular 2i indexes to greatly reduce the disk overheads of exact-term queries with small result sets (e.g. for queries by a secondary identifier such as email address)
3. The ability to return indexterms, not just keys as results of a query - so that those terms can be overloaded with additional information which can then be filtered by the application without requiring a M/R stage (note this is already available via Russell Brown's branch - https://github.com/basho/riak_kv/tree/pt34-index-values)
4. The ability to pass a regular expression to the query iterator - so that range queries will be filtered based on matches to that regular expression (for example allowing for non-trailing wildcards) before returning the keys and terms

Testing is slight at the moment, both functionally and non-functionally.  This is still very-much an experiment.  We're hoping to do some full scale volume testing on the branch in the next couple of weeks.

The branch has been developed to solve some problems we have with edge cases in our implementation for the NHS in England - where we have to support tracing across an 80M record demographic database.  I'd be interested if people thought it had value in other environments.

Regards

Martin

_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


More information about the riak-users mailing list