Bulk loading data and "Could not contact Riak Server" error
riak at uhls.com
Mon Aug 1 17:13:53 EDT 2011
I am currently load testing Riak using riak_0.14.2-1_amd64.deb with
fs.file-max set to 503840 for all users.
I have a reasonably large set of data (hundreds of millions of documents,
many terabytes in size) that is currently stored in a combination of
PostgreSQL+Redis and Disco/DDFS. The first for key/value and the second for
map/reduce to satisfy the full set of user requirements.
I am trying to consolidate these data sources so trying out a variety of
different data stores with the potential of satisfying both usage types.
With Riak, my main challenge is getting this data loaded. Using the PHP
library I am able to push 100-200 documents/sec. Is there a recommended
approach to bulk loading data? At that pace it would take a couple months
to load everything. That is not necessarily a deal breaker, but wanted to
sniff around for better options.
Related to this, I did attempt to break up my records and load them with a
bunch of concurrently running loaders. This actually seems to work fairly
well with not much of a penalty in terms of documents/sec on any single
loader process. But, once I reach 4-5 loaders running concurrently I
consistently get the "Could not contact Riak Server" error and all of my
loader processes die simultaneously. If I wait a few seconds the Riak
server does begin to respond again.
Any idea for approaching this differently? Is attempting to run many
loaders concurrently a bad idea with Riak?
I am running a single server right now while I test with bucket nval set to
View this message in context: http://riak-users.197444.n3.nabble.com/Bulk-loading-data-and-Could-not-contact-Riak-Server-error-tp3217091p3217091.html
Sent from the Riak Users mailing list archive at Nabble.com.
More information about the riak-users