Python Driver Write Times

Greg Steffensen greg.steffensen at gmail.com
Sun Nov 28 12:10:41 EST 2010


This is due to two factors:

1) Durability.  MongoDB stores writes in RAM and flushes them to disk
periodically (by default, every 60 seconds, according to this page:
http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means that
its writes can seem very, very fast, but if the machine goes down, you could
lose up to 60 seconds of data.  Riak writes don't return until the data has
actually been persisted to disk.  Casandra takes the same approach as
MongoDB, with the same trade-off.

2) Parallelism.  This test isn't taking advantage of Riak's distributed
nature.  Riak really shines when its run on a cluster of machines- you can
make your write throughput almost arbitrarily fast, as long as you're
willing add enough machines to the cluster.

I doubt that you'll be able to get single-node Riak to write as fast as
Mongo, but I'd guess that that numbers will get a little closer if you do
several writes simultaneously in both by multi-threading using python's
threading module.  Also, be sure that you're using Riak's protocol buffers
interface, instead of the REST (HTTP) one, which adds a lot of overhead- I
believe the python client supports both.

Greg



On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson <zapphutz at gmail.com>wrote:

> Hello,
>
> I've recently started to explore using Riak (v0.13.0-2) from Python
> (v2.6.5) as a datastore, and I've run into a performance issue that I'm
> unsure of the true origin of, and would like some input from users who have
> been working with Riak and its Python drivers.
>
> I have 2 tests set up, one for Riak and another for MongoDB, both using
> their respectively provided Python drivers. I'm constructing chunks of JSON
> data consisting of a Person, who has an Address, and a purchase history
> which contains 1 to 20 line items with some data about the item name, cost,
> # puchased, etc. A very simple mockup of a purchase history. It does this
> for 1 million "people" (my initial goal was to see how lookups fared when
> you reach 1m+ records)
>
> When using MongoDB, the speed of inserts is incredibly fast. When using
> Riak, however, there is a very noticeable lag after each insert. So much so
> that when running side by side, the MongoDB test breaks into the 10,000s
> before Riak hits it's first 1k.
>
> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
> VM, I have Riak and MongoDB running concurrently.
>
> Here is a sample of how I'm using the Riak driver:
>
>     riak_conn = RiakClient()
>     bucket = riak_conn.bucket("peopledb")
>     for i in range(1,1000000):
>         try:
>             new_obj = bucket.new("p" + str(i),MakePerson())
>             new_obj.store(return_body=False)
>         except Exception as e:
>             print e
>
> I'm wondering if there is something blatantly wrong I'm doing. I didn't see
> any kind of batch-store method on the bucket (instead of calling store on
> each object, simply persist the entirety of the bucket itself), and I wasn't
> sure if this was an issue with my particular setup (maybe the specifics of
> my VM are somehow throttling its performance), or maybe just a known
> limitation that I wasn't aware of.
>
> To shed some light on the disparity, I re factored my persistence into
> separate methods, and used a wrapper to pull out the execution times. Here
> is a very condensed list of run times. The method in question, for both
> datastores, simply creates a new "Person" and stores it. Nothing else.
>
> MakeRiakPerson took 40.139 ms
> MakeRiakPerson took 40.472 ms
> MakeRiakPerson took 40.651 ms
> MakeRiakPerson took 51.630 ms
> MakeRiakPerson took 36.733 ms
>
> MakeMongoPerson took 1.810 ms
> MakeMongoPerson took 3.619 ms
> MakeMongoPerson took 1.036 ms
> MakeMongoPerson took 1.275 ms
> MakeMongoPerson took 3.656 ms
>
> Thankyou in advance for any help that can be offered here. I'm incredibly
> new to Riak as a whole, as well as very inexperienced when it comes to
> working in a *nix environment, so I imagine there are countless ways I could
> have shot myself in the foot without realizing it.
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101128/8533c34c/attachment.html>


More information about the riak-users mailing list