Riak cluster-f#$%

Callixte Cauchois ccauchois at virtuoz.com
Mon Oct 1 15:23:30 EDT 2012


Hi there,

so, I am currently evaluating Riak to see how it can fit in our platform.
To do so I have set up a cluster of 4 nodes on SmartOS, all of them on the
same physical box. I then built a simple application in node.js that get
log events from our production system through a RabbitMQ queue and store
them in my cluster. I let Riak generate the ids, but I have added two
secondary indices to be able to retrieve more easily all the log events
that belong to a single session.
Everything was going fine, events come around 130 messages per second are
easily ingested by Riak. When stop it and then restart it, there is a bit
of an issue as the events are read from the queue at 1500 messages per
second and the insertion times go up, so I need some retries to actually
store everything.
I wanted to tweak the LevelDB params to increase the throughput. To do so,
I first upgraded from 1.1.6 to 1.2.0. I chose what I thought was the safest
way: node by node, I have them leave the cluster, then I upgrade, then join
again. During the whole process I kept inserting.
It went quite well. But, when I ran some queries using 2i, it gave me
errors and I realized that for two of my four nodes, I forgot to put back
eLevelDB as the default engine. As soon as I ran this query, everything
went havoc, a lot of inserts failed, some nodes where not reachable using
the ping url.
I changed the default engine and restarted those nodes, nothing changed. I
tried to make them leave the cluster, after two days, they are still
leaving. Riak-admin transfers tells that a lot of transfers need to occur,
but the system is stuck: the numbers there do not change.

I guess I have done several things wrong. It is test data, so it doesn't
really matter if I loose data or if I have to re-start from scratch, but I
want to understand what have gone wrong how I could have fixed it. Or if I
even can recover from there now.

Thank you.
C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121001/a1e0e68f/attachment.html>


More information about the riak-users mailing list