Put failure: too many siblings

Russell Brown russell.brown at me.com
Fri Jun 17 02:45:06 EDT 2016


What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?

On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <v.zakhozhai at smartweb.com.ua> wrote:

> Hello.
> 
> I see very interesting and confusing thing.
> 
> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
> 
> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
> 
> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
> 
> 
> 
> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <v.zakhozhai at smartweb.com.ua> wrote:
> Hi, Luke.
> 
> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
> 
> Now I have transfer-limit 1 on all riak nodes.
> 
> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
> 
> In the official docs I've read:
> 
> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
> 
> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
> 
> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
> 
> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <lbakken at basho.com> wrote:
> Hi Vladyslav,
> 
> If you recognize the full name of the object raising the sibling
> warning, it is most likely a manifest object. Sometimes, during hinted
> handoff, you can see these messages. They should resolve after handoff
> completes.
> 
> Please see the documentation for the transfer-limit command as well:
> 
> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
> 
> --
> Luke Bakken
> Engineer
> lbakken at basho.com
> 
> 
> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
> <v.zakhozhai at smartweb.com.ua> wrote:
> > Hi.
> >
> > I have a trouble with PUT to Riak CS cluster. During this process I
> > periodically see the following message in Riak error.log:
> >
> > 2016-06-03 11:15:55.201 [error]
> > <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> > siblings for object OBJECT_NAME (101)
> >
> > and also
> >
> > 2016-06-03 12:41:50.678 [error]
> > <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> > {7345880,{error,{too_many_siblings,101}}}
> >
> > Here OBJECT_NAME - is the name of object in Riak which has too many
> > siblings.
> >
> > I definitely sure that this objects are static. Nobody deletes is, nobody
> > rewrites it. I have no idea why more than 100 siblings of this object
> > occurs.
> >
> > The following effect of this issue occurs:
> >
> > Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> > has it own key or key duplicate?).
> > Nodes are slow - adding new nodes are too slow
> > Presence of "too many siblings" affects ownership handoffs
> >
> > So I have several questions:
> >
> > Do hinted or ownership handoffs can affect siblings count (I mean can
> > siblings be created during ownership of hinted handoffs)
> > Is there any workaround of this issue. Do I need remove siblings manually or
> > it removes during merges, read repairs and so on
> >
> >
> > My configuration:
> >
> > riak from basho's packages - 2.1.3-1
> > riak cs from basho's packages - 2.1.0-1
> > 24 riak/riak-cs nodes
> > 32 GB RAM per node
> > AAE is disabled
> >
> >
> > I appreciate you help.
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list