filter by secondary index, sort and limit - best practice?

Alexey Loshkarev elf2001 at gmail.com
Thu May 10 12:02:20 EDT 2012


I have bucket with nginx access logs (parsed to json).
Each document has it secondary index (date)

I want to retrieve last record for some hour (python):

riak.mapreduce.RiakMapReduce(client).index('logs.api', 'date_bin',
'2012051017', '2012051018').map("""function(doc) { return [[doc.key,
JSON.parse(doc.values[0].data)]]; }""").reduce_limit(100).run()

It takes about 8 seconds to complete.

Than I want to sort results before returning (as I see, results at
map() are not sorted).

riak.mapreduce.RiakMapReduce(client).index('logs.api', 'date_bin',
'2012051017', '2012051018').map("""function(doc) { return [[doc.key,
JSON.parse(doc.values[0].data)]];
}""").reduce('Riak.reduceSort').reduce_limit(100).run()

This query hangs forever (i have tired to wait to finish).

But this index filter leaves about 5000 records. Why it is so slow?
What I missed?

-- 
----------------
Best regards
Alexey Loshkarev
mailto:elf2001 at gmail.com




More information about the riak-users mailing list