Riak 1.4 - fastest way to count all records in bucket (100+ millions)

Christian Rosnes christian.rosnes at gmail.com
Wed Jul 31 03:54:19 EDT 2013


I have 4 node Riak 1.4 test cluster on Azure
(Large: 4core, 7GB RAM instances).

I'm wondering if the script below is the fastest way
to do a full count of all the records in a bucket in Riak 1.4?
Or is there some other way I could try that could be faster?

Also, are there any parameters in the configuration files
that can influence the speed of this type of "large"
erlang map reduce job?




riak01$ ./count.all.records.in.bucket.sh

Counting all records in bucket: entries (Wed Jul 31 05:32:36 UTC 2013)

[109 542 663]

real    116m7.132s
user    0m0.000s
sys     0m0.376s

Done: Wed Jul 31 07:28:43 UTC 2013

Script: count.all.records.in.bucket.sh
time curl -XPOST http://localhost:8098/mapred \
  -H 'Content-Type: application/json' \
  -d '{"inputs":{
       "timeout": 90000000}'
