Secondary Index Map and reduce order and performance

Sajithkumar Kizhakkiniyil Sajithkumar.Kizhakkiniyil at apollogrp.edu
Wed Nov 30 16:28:37 EST 2011


Hello
Probably my understanding of M/R might be wrong. But I am getting drastic performance difference when running secondary index query on PB with map and reduce function in different order.
If my understanding is correct a reduce phase with riak_kv_mapreduce.reduce_identity is needed for secondary index query. I added one map phase to get the value instead of the key

But if I send the reduce before the map as you see in the map reduce payload JSON the values are return much faster than the other way. In my test it 251 ms vs 700ms. Anyone can explain this behavior.

Reduce before map (Faster)
-------
{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}},{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}}]}

Map before reduce (Slower)
--------------
{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}},{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}}]}


________________________________
This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111130/3fed85ac/attachment.html>


More information about the riak-users mailing list