Experimental branch - 2i query improvements

Martin Sumner martin.sumner at adaptip.co.uk
Wed Apr 17 19:55:14 EDT 2013


D'oh.  I meant 400 bytes.


On 18 April 2013 00:05, Kresten Krab Thorup <krab at trifork.com> wrote:

> Very interesting!
>
> Regarding feature #1: I don't understand how an ets based index adds up to
> 400k per posting; as your write up suggests. Did you mean 400b? I thought
> ets was reasonably memory efficient.  Do you use very large keys?
>
> Sent from my iPhone
>
> On 16/04/2013, at 23.50, "Martin Sumner" <martin.sumner at adaptip.co.uk
> <mailto:martin.sumner at adaptip.co.uk>> wrote:
>
> I've been working on an experimental branch to offer some improvements to
> the functionality and performance of 2i queries in Riak:
> https://github.com/martinsumner/riak_kv
>
> Explanation:
> https://github.com/martinsumner/riak_kv/blob/master/docs/index_speedup.md
>
> There are four basic features that are included:
> 1. The ability to pin particular 2i indexes into memory (without loss of
> consistency on restart of a node)
> 2. The ability to set partition-level static bloom filters for particular
> 2i indexes to greatly reduce the disk overheads of exact-term queries with
> small result sets (e.g. for queries by a secondary identifier such as email
> address)
> 3. The ability to return indexterms, not just keys as results of a query -
> so that those terms can be overloaded with additional information which can
> then be filtered by the application without requiring a M/R stage (note
> this is already available via Russell Brown's branch -
> https://github.com/basho/riak_kv/tree/pt34-index-values)
> 4. The ability to pass a regular expression to the query iterator - so
> that range queries will be filtered based on matches to that regular
> expression (for example allowing for non-trailing wildcards) before
> returning the keys and terms
>
> Testing is slight at the moment, both functionally and non-functionally.
>  This is still very-much an experiment.  We're hoping to do some full scale
> volume testing on the branch in the next couple of weeks.
>
> The branch has been developed to solve some problems we have with edge
> cases in our implementation for the NHS in England - where we have to
> support tracing across an 80M record demographic database.  I'd be
> interested if people thought it had value in other environments.
>
> Regards
>
> Martin
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130418/6ff9eebb/attachment.html>


More information about the riak-users mailing list