Using riak as a `Comment` Store - Slow results

Alex Thompson godfoder at acis.ufl.edu
Tue Oct 30 14:03:30 EDT 2012


Herman,

First: Note that nodejs, while asynchronous, is single threaded. Make sure you're not overwhelming your server (100% CPU, swapping memory), or running into internal http connection pool limits (Node's default is 5 simultaneous connections, I bump mine to 200). If you are, look at cluster2 for multi-thread single server clustering, and proxies like haproxy for multi-server clustering.

Second: It would seem to me you're begging for concurrency issues doing it this way, it would be much better to have, say a bucket per parent url (I don't think riak has any bucket count limits right?). The on disk storage uses bucket+key as the storage key anyways, so this allows you to do one level of key-filtering (which riak isn't amazing at) for free. 

Other options: Secondary indexes (2i) with the parent url (pretty doable), or riak-search (probably overkill). 

Also, make sure you're actually distributing your requests across multiple riak nodes, the client may or may not handle this for you. I run my riak nodes behind a haproxy to distribute the requests (Although I'm using riakjs, so client behavior will differ.)

If you want more information on the nodejs specifics, I can probably give you some points from my codebase. Other users are probably much more knowledgeable about effectively implementing 2i and riak-search (I've only played with them.)

- Alex

----- Original Message -----
From: "Herman Junge" <hermanalonsojunge at gmail.com>
To: riak-users at lists.basho.com
Sent: Tuesday, October 30, 2012 11:27:51 AM
Subject: Using riak as a `Comment` Store - Slow results



Hi list. 

I am doing a research on using riak a s a solution to store comments. Unfortunately my results were far from favorable. I will develop the a rchitecture I used, schemas chose n, steps taken and results ; Hoping to get feedback both f rom basho or any experienced user on what to do to improve these times , or wheter to discard riak as a store for comments. 

1. The problem 

Store comments. Given a _pa rent_url_ (which could be a blo g po st, an image, an ything with an url), group its comments. 


2. Architecture 


2.1. Riak Database 

Used a joyen t cloud and set up 4 SmartOS machines with 1024 M B RAM each. They have riak preinstalled. 


2.2. Client Applica tion 

A pplication built in node.js , used expre ssJS framework ( https://github.com/visionmedia/express ) to respond HTTP requests (specifically PUT and GET) . The Riak library is node_riak ( https://github.com/mranney/node_riak ) , which has been `tested in comba t ` by its creato rs in voxer. 


The client application runs in another machine in the jo yent cloud , this machine an ubuntu 12 .04 with 1024 MB RAM. 


3. S chemas Chosen 

I went with a very simple schema: Sin ce the comments are grouped by _parent_url_ . I'm using parent_url as a key, its value being a n array of the comments in json. 

An example for a key is : <server_url>/riak/par ent_url/http%3A%2F%2Fpath%2Fto%2Fmy%2Fsite%2Ffile.html 

An example for a value is: 

{ "comments" : 
[ 
{ "date" : "'2012-10-30T14:50:11.898Z" 
, "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." 
, "author" : "John Doe" 
} 
, { "date" : "'2012-10-30T14:50:11.898Z" 
, "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." 
, "author" : "John Doe" 
} 
, { "date" : "'2012-10-30T14:50:11.898Z" 
, "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." 
, "author" : "John Doe" 
} 
] 
} 

4. Steps Taken 

4.1. Client API : 

My Client API tooks two requests: 

* PUT /comment 
* GET /comments/:parent_url?offset=<offset>&limit=<limit> 

4.1.1 PUT /comment 

Stores a co mment in the parent_url given inside the request. I use the node_riak's method `client.m odify ()`, which `GET`'s the parent_url value to take its value, the n apply the mutation (given by the library user, in this case is just pushing the json val ue of the comment in the array), then, `P U T`'s its new value on the parent_url key. 

4.1.2 GET /comments/:parent_url?offset=<offset>&limit=<limit> 

GETS the comments from a parent_url given, starting from <offset> to <limit>. 

Internally I just issue a `GET` to riak, the controller of my client does the offset, limit extraction. 

! 4.2. The Stress Test 

Issued a new joyent machine (an Ubuntu 12.04 with 1024 MB RAM) just to make `ab` stress tests. 

I done six tests: 

API method nº of requests concurrency 
PUT (*1) 10,000 5 
PUT (*1) 10,000 50 
PUT (*1) 10,000 500 
GET (*2) 10,000 5 
GET (*2) 10,000 50 
GET (*2) 10,000 500 

(*1) PUT /comment 
(*2) GET comments/http%3A%2F%2Fpath%2Fto%2Fmy%2Fsite%2F1111.html?offset=25&limit=20 


5. Results 

The following tables show the results I got on each test: 

PUT 
10000 5 



50% 116 
65% 142 
70% 161 
85% 177 
90% 274 
95% 486 
98% 751 
99% 1165 
100% 1065 

PUT 
10000 50 



50% 1879 
65% 1990 
70% 2068 
85% 2124 
90% 2364 
95% 2734 
98% 4062 
99% 4591 
100% 11258 

PUT 
10000 500 



50% 20876 
65% 21491 
70% 21919 
85% 22202 
90% 23136 
95% 23914 
98% 25036 
99% 25835 
100% 29611 

GET 
10000 



50% 68 
65% 75 
70% 80 
85% 83 
90% 94 
95% 107 
98% 145 
99% 475 
100% 535 

GET 
10000 



50% 631 
65% 673 
70% 701 
85% 719 
90% 783 
95% 913 
98% 1054 
99% 1099 
100% 1265 

GET 
10000 



50% 6363 
65% 6636 
70% 6820 
85% 6934 
90% 7218 
95% 7442 
98% 7691 
99% 7836 
100% 8435 


6. Conclusion 

At first sight, I'm getting very unfavorable results (compared with one table MySQL unconfigured under the very same requests). So I'm requesting from feedback from you: 

a) ¿Is it a good idea to use Riak as a comment store? 

b) Are these times expected? (in other words, where I am making a big mistake)? 

Regards, 

Herman Junge 
@hermanjunge 








_______________________________________________
riak-users mailing list
riak-users at lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list