[error] Supervisor riak_pipe_vnode_worker_sup had child undefined started with ...

Ivaylo Panitchkov ipanitchkov at hibernum.com
Wed Jun 13 15:03:57 EDT 2012


Hi Mark,

I resized the cluster from 4x1GB RAM to 4x4GB RAM. Also increased 
{map_js_vm_count, 8 } to {map_js_vm_count, 48 } and {reduce_js_vm_count, 
6 } to {reduce_js_vm_count, 36 } inside app.config but still have the 
same problem from time to time...

The function I use to do the link-walking call is below:

/**
  * Fetches all user artifacts using link walking and returns them in 
the callback
  *
  * @param id - unique user id
  * @param cb
  * @return - an array of user artifacts
  */
fetchUserArtifacts = function(id, cb) {
     riakwalk(config.CONCRETESBUCKET, id, [
         [config.ARTIFACTSBUCKET, 'artifacts', '_']
     ], function(user_artifacts) {
         if(user_artifacts.length) {
             // the first element in the result holds all links, so we 
will skip it
             cb(user_artifacts[1]);
         }
         else {
             cb([]);
         }
     });
};

As you could see it's a simple query fetching 50-150 small objects. The 
cluster is almost idle so it should be able to serve the request. I had 
similar problem awhile ago and decided to fetch objects one by one 
instead of using link-walking and that patch did the trick. The 
performance degraded slightly but at least worked all the time. For the 
case mentioned here I just created a non link-walking version that 
fetches 150 objects for about a second which is acceptable. Will 
investigate further when I have time :-)

Ivaylo



On 12-06-10 05:20 PM, Mark Phillips wrote:
> Hi Ivaylo,
>
> Take a look at this thread:
>
> http://riak.markmail.org/search/?q=exit%20with%20reason%20fitting_died%20in%20context%20child_terminated#query:exit%20with%20reason%20fitting_died%20in%20context%20child_terminated+page:1+mid:n4gfl43hcvzthjl7+state:results
>
> I think this is what you're seeing. You should read the entire message 
> I linked to, but the important thing is that the reason you're seeing 
> the "fitting_died in context child_terminated" logs is due to a 
> timeout with a Riak Pipe-based M/R process. To paraphrase Bryan Fink, 
> those messages are normal and intended to help debug issues. Are you 
> still seeing them?
>
> I would be interested to know what type of MapReduce load you're 
> putting on your cluster. "4 machines x 1GB RAM" isn't a very powerful 
> cluster and MapReduce jobs (especially those written in java script) 
> can tax Riak nodes significantly. Anything details you can share?
>
> Mark
>
>
>
> On Wed, Jun 6, 2012 at 4:38 PM, Ivaylo Panitchkov 
> <ipanitchkov at hibernum.com <mailto:ipanitchkov at hibernum.com>> wrote:
>
>
>     Hello everyone,
>
>     We started getting the following errors on all servers in the
>     cluster (4 machines x 1GB RAM, riak_1.0.2-1_amd64.deb):
>
>     20:12:36.753 [error] Supervisor riak_pipe_vnode_worker_sup had
>     child undefined started with
>     {riak_pipe_vnode_worker,start_link,undefined} at <0.8855.0> exit
>     with reason fitting_died in context child_terminated
>     20:12:36.754 [error] Supervisor riak_pipe_vnode_worker_sup had
>     child undefined started with
>     {riak_pipe_vnode_worker,start_link,undefined} at <0.8856.0> exit
>     with reason fitting_died in context child_terminated
>     20:12:36.965 [error] Supervisor riak_pipe_vnode_worker_sup had
>     child undefined started with
>     {riak_pipe_vnode_worker,start_link,undefined} at <0.8860.0> exit
>     with reason fitting_died in context child_terminated
>     20:12:36.967 [error] Supervisor riak_pipe_vnode_worker_sup had
>     child undefined started with
>     {riak_pipe_vnode_worker,start_link,undefined} at <0.8861.0> exit
>     with reason fitting_died in context child_terminated
>
>
>     If we restart the riak service on all machines one by one the
>     error message disappears for a while.
>     Any ideas to solve the issue will be much appreciated.
>
>     Thanks in advance,
>     Ivaylo
>
>     REMARK: Replaced the IP addresses for security sake
>
>     *root at riak01:~# riak-admin member_status*
>     Attempting to restart script through sudo -u riak
>     ================================= Membership
>     ==================================
>     Status     Ring    Pending    Node
>     -------------------------------------------------------------------------------
>     valid      25.0%      --      'riak at IP1'
>     valid      25.0%      --      'riak at IP2'
>     valid      25.0%      --      'riak at IP3'
>     valid      25.0%      --      'riak at IP4'
>     -------------------------------------------------------------------------------
>     Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
>     *root at riak01:~# riak-admin ring_status*
>     Attempting to restart script through sudo -u riak
>     ================================== Claimant
>     ===================================
>     Claimant:  'riak at IP1'
>     Status:     up
>     Ring Ready: true
>
>     ============================== Ownership Handoff
>     ==============================
>     No pending changes.
>
>     ============================== Unreachable Nodes
>     ==============================
>     All nodes are up and reachable
>
>     *root at riak01:~# riak-admin ringready*
>     Attempting to restart script through sudo -u riak
>     TRUE All nodes agree on the ring
>     ['riak at IP1','riak at IP2','riak at IP3','riak at IP4']
>
>     *root at riak01:~# riak-admin transfers*
>     Attempting to restart script through sudo -u riak
>     No transfers active
>

-- 
Ivaylo Panitchkov
Software developer
Hibernum Creations Inc.

Ce courriel est confidentiel et peut aussi être protégé par la loi.Si vous avez reçu ce courriel par erreur, veuillez nous en aviser immédiatement en y répondant, puis supprimer ce message de votre système. Veuillez ne pas le copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu à quiconque.
This email is confidential and may also be legally privileged. If you have received this email in error, please notify us immediately by reply email and then delete this message from your system. Please do not copy it or use it for any purpose or disclose its content.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120613/18f53124/attachment.html>


More information about the riak-users mailing list