Python Driver Write Times

Derek Sanderson zapphutz at gmail.com
Sun Nov 28 11:48:02 EST 2010


Hello,

I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5)
as a datastore, and I've run into a performance issue that I'm unsure of the
true origin of, and would like some input from users who have been working
with Riak and its Python drivers.

I have 2 tests set up, one for Riak and another for MongoDB, both using
their respectively provided Python drivers. I'm constructing chunks of JSON
data consisting of a Person, who has an Address, and a purchase history
which contains 1 to 20 line items with some data about the item name, cost,
# puchased, etc. A very simple mockup of a purchase history. It does this
for 1 million "people" (my initial goal was to see how lookups fared when
you reach 1m+ records)

When using MongoDB, the speed of inserts is incredibly fast. When using
Riak, however, there is a very noticeable lag after each insert. So much so
that when running side by side, the MongoDB test breaks into the 10,000s
before Riak hits it's first 1k.

My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
VM, I have Riak and MongoDB running concurrently.

Here is a sample of how I'm using the Riak driver:

    riak_conn = RiakClient()
    bucket = riak_conn.bucket("peopledb")
    for i in range(1,1000000):
        try:
            new_obj = bucket.new("p" + str(i),MakePerson())
            new_obj.store(return_body=False)
        except Exception as e:
            print e

I'm wondering if there is something blatantly wrong I'm doing. I didn't see
any kind of batch-store method on the bucket (instead of calling store on
each object, simply persist the entirety of the bucket itself), and I wasn't
sure if this was an issue with my particular setup (maybe the specifics of
my VM are somehow throttling its performance), or maybe just a known
limitation that I wasn't aware of.

To shed some light on the disparity, I re factored my persistence into
separate methods, and used a wrapper to pull out the execution times. Here
is a very condensed list of run times. The method in question, for both
datastores, simply creates a new "Person" and stores it. Nothing else.

MakeRiakPerson took 40.139 ms
MakeRiakPerson took 40.472 ms
MakeRiakPerson took 40.651 ms
MakeRiakPerson took 51.630 ms
MakeRiakPerson took 36.733 ms

MakeMongoPerson took 1.810 ms
MakeMongoPerson took 3.619 ms
MakeMongoPerson took 1.036 ms
MakeMongoPerson took 1.275 ms
MakeMongoPerson took 3.656 ms

Thankyou in advance for any help that can be offered here. I'm incredibly
new to Riak as a whole, as well as very inexperienced when it comes to
working in a *nix environment, so I imagine there are countless ways I could
have shot myself in the foot without realizing it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101128/ee21adb3/attachment.html>


More information about the riak-users mailing list