Riak 2i http query much faster than python api?

Jeff Peck jeffp at tnrglobal.com
Wed Apr 10 16:03:45 EDT 2013


Thanks Evan. I tried doing it in python like this (realizing that the previous way I did it uses MapReduce) and I had better results. It finished in 3.5 minutes, but nowhere close to the 15 seconds from the straight http query:

import riak
from pprint import pprint

bucket_name = "mybucket"

client = riak.RiakClient(port=8087,transport_class=riak.RiakPbcTransport)
bucket = client.bucket(bucket_name)
results = bucket.get_index('status_bin', 'PERSISTED')

print len(results)


On Apr 10, 2013, at 4:00 PM, Evan Vigil-McClanahan <emcclanahan at basho.com> wrote:

> get_index() is the right function there, I think.
> 
> On Wed, Apr 10, 2013 at 2:53 PM, Jeff Peck <jeffp at tnrglobal.com> wrote:
>> I can grab over 900,000 keys from an indexs, using an http query in about 15 seconds, whereas the same operation in python times out after 5 minutes. Does this indicate that I am using the python API incorrectly? Should I be relying on an http request initially when I need to grab this many keys?
>> 
>> (Note: This is tied to the question that I asked earlier, but is also a general question to help understand the proper usage of the python API.)
>> 
>> Thanks! Examples are below.
>> 
>> - Jeff
>> 
>> ---
>> 
>> HTTP:
>> 
>> $ time curl -s http://localhost:8098/buckets/mybucket/index/status_bin/PERSISTED | grep -o , | wc -l
>> 926047
>> 
>> real    0m14.583s
>> user    0m2.500s
>> sys     0m0.270s
>> 
>> ---
>> 
>> Python:
>> 
>> import riak
>> 
>> bucket = "my bucket"
>> client = riak.RiakClient(port=8098)
>> results = client.index(bucket, 'status_bin', 'PERSISTED').run(timeout=5*60*1000) # 5 minute timeout
>> print len(results)
>> 
>> (times out after 5 minutes)
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list