Riak 1.4 - fastest way to count all records in bucket (100+ millions)

Christian Rosnes christian.rosnes at gmail.com
Wed Jul 31 03:54:19 EDT 2013


Hi,

I have 4 node Riak 1.4 test cluster on Azure
(Large: 4core, 7GB RAM instances).

I'm wondering if the script below is the fastest way
to do a full count of all the records in a bucket in Riak 1.4?
Or is there some other way I could try that could be faster?

Also, are there any parameters in the configuration files
that can influence the speed of this type of "large"
erlang map reduce job?


Thanks.

Christian
@NorSoulx

--

riak01$ ./count.all.records.in.bucket.sh

Counting all records in bucket: entries (Wed Jul 31 05:32:36 UTC 2013)

[109 542 663]

real    116m7.132s
user    0m0.000s
sys     0m0.376s

Done: Wed Jul 31 07:28:43 UTC 2013

Script: count.all.records.in.bucket.sh
--------------------------------------
time curl -XPOST http://localhost:8098/mapred \
  -H 'Content-Type: application/json' \
  -d '{"inputs":{
           "bucket":"entries",
           "index":"$bucket",
           "key":"entries"
       },
       "query":[{"reduce":{"language":"erlang",
                           "module":"riak_kv_mapreduce",
                           "function":"reduce_count_inputs",
                           "arg":{"do_prereduce":true}
                          }
               }],
       "timeout": 90000000}'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130731/d7d5a896/attachment.html>


More information about the riak-users mailing list