Bulk loading data and "Could not contact Riak Server" error

gtuhl riak at uhls.com
Mon Aug 1 17:13:53 EDT 2011


I am currently load testing Riak using riak_0.14.2-1_amd64.deb with
fs.file-max set to 503840 for all users.

I have a reasonably large set of data (hundreds of millions of documents,
many terabytes in size) that is currently stored in a combination of
PostgreSQL+Redis and Disco/DDFS.  The first for key/value and the second for
map/reduce to satisfy the full set of user requirements.

I am trying to consolidate these data sources so trying out a variety of
different data stores with the potential of satisfying both usage types.

With Riak, my main challenge is getting this data loaded.  Using the PHP
library I am able to push 100-200 documents/sec.  Is there a recommended
approach to bulk loading data?  At that pace it would take a couple months
to load everything.  That is not necessarily a deal breaker, but wanted to
sniff around for better options.

Related to this, I did attempt to break up my records and load them with a
bunch of concurrently running loaders.  This actually seems to work fairly
well with not much of a penalty in terms of documents/sec on any single
loader process.  But, once I reach 4-5 loaders running concurrently I
consistently get the "Could not contact Riak Server" error and all of my
loader processes die simultaneously.  If I wait a few seconds the Riak
server does begin to respond again.

Any idea for approaching this differently?  Is attempting to run many
loaders concurrently a bad idea with Riak?

I am running a single server right now while I test with bucket nval set to
1.

--
View this message in context: http://riak-users.197444.n3.nabble.com/Bulk-loading-data-and-Could-not-contact-Riak-Server-error-tp3217091p3217091.html
Sent from the Riak Users mailing list archive at Nabble.com.




More information about the riak-users mailing list