Riak cluster-f#$%

Callixte Cauchois ccauchois at virtuoz.com
Mon Oct 1 16:30:00 EDT 2012


Thank you, but can you explain a bit more?
I mean I understand why it is a bad thing with regards to reliability and
in case of hardware issues. But does it have also an impact on the
behaviour when the hardware is performing correctly and the load on the
machines are the same?

On Mon, Oct 1, 2012 at 1:25 PM, Alexander Sicular <siculars at gmail.com>wrote:

> Inline.
>
> -Alexander Sicular
>
> @siculars
>
> On Oct 1, 2012, at 3:23 PM, Callixte Cauchois wrote:
>
> > Hi there,
> >
> > so, I am currently evaluating Riak to see how it can fit in our
> platform. To do so I have set up a cluster of 4 nodes on SmartOS, all of
> them on the same physical box.
>
> Mistake. Just stop here. Everything else doesn't matter. Do not put all
> your virtual machines (riak nodes) on one physical machine. Put em on
> different physical machines. Fix the config files and try again.
>
> > I then built a simple application in node.js that get log events from
> our production system through a RabbitMQ queue and store them in my
> cluster. I let Riak generate the ids, but I have added two secondary
> indices to be able to retrieve more easily all the log events that belong
> to a single session.
> > Everything was going fine, events come around 130 messages per second
> are easily ingested by Riak. When stop it and then restart it, there is a
> bit of an issue as the events are read from the queue at 1500 messages per
> second and the insertion times go up, so I need some retries to actually
> store everything.
> > I wanted to tweak the LevelDB params to increase the throughput. To do
> so, I first upgraded from 1.1.6 to 1.2.0. I chose what I thought was the
> safest way: node by node, I have them leave the cluster, then I upgrade,
> then join again. During the whole process I kept inserting.
> > It went quite well. But, when I ran some queries using 2i, it gave me
> errors and I realized that for two of my four nodes, I forgot to put back
> eLevelDB as the default engine. As soon as I ran this query, everything
> went havoc, a lot of inserts failed, some nodes where not reachable using
> the ping url.
> > I changed the default engine and restarted those nodes, nothing changed.
> I tried to make them leave the cluster, after two days, they are still
> leaving. Riak-admin transfers tells that a lot of transfers need to occur,
> but the system is stuck: the numbers there do not change.
> >
> > I guess I have done several things wrong. It is test data, so it doesn't
> really matter if I loose data or if I have to re-start from scratch, but I
> want to understand what have gone wrong how I could have fixed it. Or if I
> even can recover from there now.
> >
> > Thank you.
> > C.
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121001/9115e37e/attachment.html>


More information about the riak-users mailing list