Emulating composite index queries with secondary indexes and key filters

Olav Frengstad olav at fwt.no
Tue Aug 28 09:27:41 EDT 2012


Looking at riak_kv_mapred_json it seems to be the case that you
can only do key filter on entire buckets.

On the other hand one can still use the riak_kv_mapred_filters, even
though it's ugly constructing all the filter manually instead of matching
binary patterns:

{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087),
{ok, Filter} = riak_kv_mapred_filters:build_filter([[<<"ends_with">>,"1"]]),
MapReduce = [
  { reduce
  , {qfun, fun(X, F) -> lists:filter(fun({A, B}) -> F(B) end, X) end},
  , riak_kv_mapred_filters:compose(Filter),
  , true}],
Index = {index, <<"test1">>, <<"field_int">>, <<"123">>},
riakc_pb_socket:mapred(Pid, Index, MapReduce).

Question 2 & 3 still remains though:
>>   2) Would this be a efficient solution, considering the 2i query
>>      will return 10k+ results? the reduce should cut that in half.
>>   3) What other options to i have for querying this? Obviously i can
>>      use Riak search, but the term based indexing puts me off. Other
>>      option is building this manually with empty object just linking.


2012/8/28 Jeremiah Peschka <jeremiah.peschka at gmail.com>:
> As best as I can recall, you can't key filter on 2i. You can, however,
> perform range filtering. You could query where the 2i key is between
> 20110101T00:00:00Z|a|a and 20110201T00:00:00Z|zzz|zzz
>
> Please forgive any typos. I'm using a phone.
>
>
> On Tuesday, August 28, 2012, Olav Frengstad wrote:
>>
>> Hey,
>>
>> I'm looking to use riak to store time series. So naturaly i'm in the
>> processes of validating all possible methods this query. A object
>> has a id, origin, timestamp and type. The query in question is to
>> select all object within a time range that originated from "origin"
>> and has a certain "type".
>>
>> The current plan is to store the timestamp as a secondary index and
>> then have composite keys responsible for matching origin/type predicate.
>>
>> A key would look like this: "<id>:<origin>:<type>".
>>
>> To query one would just pipe the 2i query to key filter map reduce:
>>
>> curl -X POST -H "Content-Type: application/json" -d '{"inputs":{
>> "bucket" : "seriesx", "index" : "timestamp_int", "start" : 123,
>> "end" : 456, "key_filters" : [["ends_with", "<origin>:<type>"]]}}'
>>
>> In regards to this "imaginary" solution i have a few questions:
>>   1) is this possible, or does key filter only work on a bucket?
>>   2) Would this be a efficient solution, considering the 2i query
>>      will return 10k+ results? the reduce should cut that in half.
>>   3) What other options to i have for querying this? Obviously i can
>>      use Riak search, but the term based indexing puts me off. Other
>>      option is building this manually with empty object just linking.
>>
>> --
>> Med Vennlig Hilsen
>> Olav Frengstad
>>
>> Systemutvikler // FWT
>> +47 920 42 090
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> --
> ---
> Jeremiah Peschka
> Managing Director, Brent Ozar PLF, LLC



-- 
Med Vennlig Hilsen
Olav Frengstad

Systemutvikler // FWT
+47 920 42 090




More information about the riak-users mailing list