Warning "Can not start proc_lib:init_p"

Evan Vigil-McClanahan emcclanahan at basho.com
Mon Apr 8 10:11:06 EDT 2013


You could try to read it by doing a HEAD request, which while cluster
impacting wouldn't try to pass everything over the wire. This isn't
necessary if you already have the latest vclock from some previous
attempt or write.  Then you could overwrite the object with the latest
vclock (again, more impact), which would have the same effect as
deleting the object.  If that fails, there are other options we can
explore.

The worrying question here is how this object got so many siblings in
the first place, which may be something that you want to look into and
address.

On Mon, Apr 8, 2013 at 3:36 AM, Ingo Rockel
<ingo.rockel at bluelionmobile.com> wrote:
> Hi,
>
> I've finally been able to identify the big object (was a tough one), but
> unfortuneately, riak fails deleting it:
>
> irockel at bighead:~$ curl -v -X DELETE "http://172.22.3.22:8091/riak/m/Oa|1"
> * About to connect() to 172.22.3.22 port 8091 (#0)
> *   Trying 172.22.3.22... connected
>> DELETE /riak/m/Oa|1 HTTP/1.1
>> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1
>> zlib/1.2.3.4 libidn/1.23 librtmp/2.3
>> Host: 172.22.3.22:8091
>> Accept: */*
>>
> < HTTP/1.1 503 Service Unavailable
> < Server: MochiWeb/1.1 WebMachine/1.9.2 (someone had painted it blue)
> < Date: Mon, 08 Apr 2013 08:30:08 GMT
> < Content-Type: text/plain
> < Content-Length: 18
> <
>
> Any suggestions how I could get rid of the object? It seems to be really big
> and has a lot of siblings (>100) which sums up to 2GB.
>
> Ingo
>
>
> Am 04.04.2013 17:51, schrieb Evan Vigil-McClanahan:
>>
>> Possible, but would need more information to make a guess.  I'd keep a
>>
>> close eye on that node.
>>
>> On Thu, Apr 4, 2013 at 10:34 AM, Ingo Rockel
>> <ingo.rockel at bluelionmobile.com> wrote:
>>>
>>> thanks, but it was a very obvious c&p error :) and we already have the
>>> ERL_MAX_ETS_TABLES set to 8192 as it is in the default vm.args.
>>>
>>> The only other messages were about a lot of handoff going on.
>>>
>>> Maybe the node was getting some data concerning the 2GB object?
>>>
>>> Ingo
>>>
>>> Am 04.04.2013 17:25, schrieb Evan Vigil-McClanahan:
>>>
>>>> Major error on my part here!
>>>>
>>>>> your vm.args:
>>>>> -env ERL_MAX_ETS_TABLES 819
>>>>
>>>>
>>>>
>>>> This should be
>>>>
>>>> -env ERL_MAX_ETS_TABLES 8192
>>>>
>>>> Sorry for the sloppy cut and paste.  Please do not do the former
>>>> thing, or it will be very bad.
>>>>
>>>>> This is a good idea for all systems but is especially important for
>>>>> people with large rings.
>>>>>
>>>>> Were there any other messages?  Riak constantly spawns new processes,
>>>>> but they don't tend to build up unless the backend is misbehaving (or
>>>>> a few other less likely conditions), and a backup of spawned processes
>>>>> is the only thing I can think of that would make +P help with OOM
>>>>> issues.
>>>>>
>>>>> On Thu, Apr 4, 2013 at 9:21 AM, Ingo Rockel
>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>
>>>>>>
>>>>>> A grep for "too many processes" didn't reveal anything. The process
>>>>>> got
>>>>>> killed by the oom-killer.
>>>>>>
>>>>>> Am 04.04.2013 16:12, schrieb Evan Vigil-McClanahan:
>>>>>>
>>>>>>> That's odd.  It was getting killed by the OOM killer, or crashing
>>>>>>> because it couldn't allocate more memory?  That's suggestive of
>>>>>>> something else that's wrong, since the +P doesn't do any memory
>>>>>>> limiting.  Are you getting 'too many processes' emulator errors on
>>>>>>> that node?
>>>>>>>
>>>>>>> On Thu, Apr 4, 2013 at 8:47 AM, Ingo Rockel
>>>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> the crashing node seems to be caused by the raised +P param, after
>>>>>>>> last
>>>>>>>> crash I commented the param and now the node runs just fine.
>>>>>>>>
>>>>>>>> Am 04.04.2013 15:43, schrieb Ingo Rockel:
>>>>>>>>
>>>>>>>>> Hi Evan,
>>>>>>>>>
>>>>>>>>> we added monitoring of the object sizes and there was one object on
>>>>>>>>> one
>>>>>>>>> of the three nodes mentioned which was > 2GB!!
>>>>>>>>>
>>>>>>>>> We just changed the application code to get the id of this object
>>>>>>>>> to
>>>>>>>>> be
>>>>>>>>> able to delete it. But is does happen only about once a day.
>>>>>>>>>
>>>>>>>>> We right now have another node constantly crashing with oom about
>>>>>>>>> 12
>>>>>>>>> minutes after start (always the same time frame), could this be
>>>>>>>>> related
>>>>>>>>> to the big object issue? It is not one of the three nodes. The node
>>>>>>>>> logs
>>>>>>>>> a lot of handoff receiving is going on.
>>>>>>>>>
>>>>>>>>> Again, thanks for the help!
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>>         Ingo
>>>>>>>>>
>>>>>>>>> Am 04.04.2013 15:30, schrieb Evan Vigil-McClanahan:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If it's always the same three nodes it could well be same very
>>>>>>>>>> large
>>>>>>>>>> object being updated each day.  Is there anything else that looks
>>>>>>>>>> suspicious in your logs?  Another sign of large objects is
>>>>>>>>>> large_heap
>>>>>>>>>> (or long_gc) messages from riak_sysmon.
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 4, 2013 at 3:58 AM, Ingo Rockel
>>>>>>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Evan,
>>>>>>>>>>>
>>>>>>>>>>> thanks for all the infos! I adjusted the leveldb-config as
>>>>>>>>>>> suggested,
>>>>>>>>>>> except
>>>>>>>>>>> the cache, which I reduced to 16MB, keeping this above the
>>>>>>>>>>> default
>>>>>>>>>>> helped a
>>>>>>>>>>> lot at least during load testing. And I added +P 130072 to the
>>>>>>>>>>> vm.args. Will
>>>>>>>>>>> be applied to the riak nodes the next hours.
>>>>>>>>>>>
>>>>>>>>>>> We have a monitoring using zabbbix, but haven't included the
>>>>>>>>>>> object
>>>>>>>>>>> sizes so
>>>>>>>>>>> far, will be added today.
>>>>>>>>>>>
>>>>>>>>>>> We double-checked the Linux-Performance-Doc to be sure everything
>>>>>>>>>>> is
>>>>>>>>>>> applied
>>>>>>>>>>> to the nodes, especially as the problems always are caused from
>>>>>>>>>>> the
>>>>>>>>>>> same
>>>>>>>>>>> three nodes. But everything looks fine.
>>>>>>>>>>>
>>>>>>>>>>> Ingo
>>>>>>>>>>>
>>>>>>>>>>> Am 03.04.2013 18:42, schrieb Evan Vigil-McClanahan:
>>>>>>>>>>>
>>>>>>>>>>>> Another engineer mentions that you posted your eleveldb section
>>>>>>>>>>>> and I
>>>>>>>>>>>> totally missed it:
>>>>>>>>>>>>
>>>>>>>>>>>> The eleveldb section:
>>>>>>>>>>>>
>>>>>>>>>>>>       %% eLevelDB Config
>>>>>>>>>>>>       {eleveldb, [
>>>>>>>>>>>>                   {data_root, "/var/lib/riak/leveldb"},
>>>>>>>>>>>>                   {cache_size, 33554432},
>>>>>>>>>>>>                   {write_buffer_size_min, 67108864}, %% 64 MB in
>>>>>>>>>>>> bytes
>>>>>>>>>>>>                   {write_buffer_size_max, 134217728}, %% 128 MB
>>>>>>>>>>>> in
>>>>>>>>>>>> bytes
>>>>>>>>>>>>                   {max_open_files, 4000}
>>>>>>>>>>>>                  ]},
>>>>>>>>>>>>
>>>>>>>>>>>> This is likely going to make you unhappy as time goes on; Since
>>>>>>>>>>>> all
>>>>>>>>>>>> of
>>>>>>>>>>>> those settings are per-vnode, your max memory utilization is
>>>>>>>>>>>> well
>>>>>>>>>>>> beyond your physical memory.  I'd remove the tunings for the
>>>>>>>>>>>> caches
>>>>>>>>>>>> and buffers and drop max open files to 500, perhaps.  Make sure
>>>>>>>>>>>> that
>>>>>>>>>>>> you've followed everything in:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://docs.basho.com/riak/latest/cookbooks/Linux-Performance-Tuning/,
>>>>>>>>>>>> etc.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:33 AM, Evan Vigil-McClanahan
>>>>>>>>>>>> <emcclanahan at basho.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Again, all of these things are signs of large objects, so if
>>>>>>>>>>>>> you
>>>>>>>>>>>>> could
>>>>>>>>>>>>> track the object_size stats on the cluster, I think that we
>>>>>>>>>>>>> might
>>>>>>>>>>>>> see
>>>>>>>>>>>>> something.  Even if you have no monitoring, a simple shell
>>>>>>>>>>>>> script
>>>>>>>>>>>>> curling /stats/ on each node once a minute should do the job
>>>>>>>>>>>>> for
>>>>>>>>>>>>> a
>>>>>>>>>>>>> day
>>>>>>>>>>>>> or two.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:29 AM, Ingo Rockel
>>>>>>>>>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We just had it again (around this time of the day we have our
>>>>>>>>>>>>>> highest
>>>>>>>>>>>>>> user
>>>>>>>>>>>>>> activity).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will set +P to 131072 tomorrow, anything else I should check
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>> change?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What about this memory-high-watermark which I get
>>>>>>>>>>>>>> sporadically?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ingo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 03.04.2013 17:57, schrieb Evan Vigil-McClanahan:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for +P it's been raised in R16 (which is on the current
>>>>>>>>>>>>>>> man
>>>>>>>>>>>>>>> page)
>>>>>>>>>>>>>>> on R15 it's only 32k.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The behavior that you're describing does sound like a very
>>>>>>>>>>>>>>> large
>>>>>>>>>>>>>>> object getting put into the cluster (which may cause backups
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> push
>>>>>>>>>>>>>>> you up against the process limit, could have caused scheduler
>>>>>>>>>>>>>>> collapse
>>>>>>>>>>>>>>> on 1.2, etc.).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:39 AM, Ingo Rockel
>>>>>>>>>>>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Evan,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sys_process_count is somewhere between 5k and 11k on the
>>>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>>>> right
>>>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>>>> Concerning your suggested +P config, according to the
>>>>>>>>>>>>>>>> erlang-docs, the
>>>>>>>>>>>>>>>> default for this param already is 262144, so setting it to
>>>>>>>>>>>>>>>> 655536
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> fact lower it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We chose the ring size to be able to handle growth which was
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> main
>>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>>> to switch from mysql to nosql/riak. We have 12 Nodes, so
>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>> 86
>>>>>>>>>>>>>>>> vnodes
>>>>>>>>>>>>>>>> per
>>>>>>>>>>>>>>>> node.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, we don't monitor object sizes, the majority of objects
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>> small
>>>>>>>>>>>>>>>> (below 200 bytes), but we have objects storing references to
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> small
>>>>>>>>>>>>>>>> objects which might grow to a few megabytes in size, most of
>>>>>>>>>>>>>>>> these are
>>>>>>>>>>>>>>>> paged
>>>>>>>>>>>>>>>> though and should not exceed one megabyte. Only one type is
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> paged
>>>>>>>>>>>>>>>> (implementation reasons).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The outgoing/incoming traffic constantly is around 100 Mbit,
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> peformance drops happen, we suddenly see spikes up to 1GBit.
>>>>>>>>>>>>>>>> And
>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>> spikes constantly happen on three nodes as long as the
>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>> drop
>>>>>>>>>>>>>>>> exists.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ingo
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 03.04.2013 17:12, schrieb Evan Vigil-McClanahan:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ingo,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> riak-admin status | grep sys_process_count
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> will tell you how many processes are running.  The default
>>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>> limit on erlang is a little low, and we'd suggest raising
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> (especially with your extra-large ring_size).   Erlang
>>>>>>>>>>>>>>>>> processes are
>>>>>>>>>>>>>>>>> cheap, so 65535 or even double that will be fine.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Busy dist ports are still worrying.  Are you monitoring
>>>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>>> sizes?
>>>>>>>>>>>>>>>>> Are there any spikes there associated with performance
>>>>>>>>>>>>>>>>> drops?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:03 AM, Ingo Rockel
>>>>>>>>>>>>>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Evan,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I set swt very_low and zdbbl to 64MB, setting this params
>>>>>>>>>>>>>>>>>> helped
>>>>>>>>>>>>>>>>>> reducing
>>>>>>>>>>>>>>>>>> the busy_dist_port and Monitor got {suppressed,...
>>>>>>>>>>>>>>>>>> Messages
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> lot.
>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>> the performance of the cluster suddenly drops we still see
>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>> messages.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The cluster was updated to 1.3 in the meantime.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The eleveldb section:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         %% eLevelDB Config
>>>>>>>>>>>>>>>>>>         {eleveldb, [
>>>>>>>>>>>>>>>>>>                     {data_root, "/var/lib/riak/leveldb"},
>>>>>>>>>>>>>>>>>>                     {cache_size, 33554432},
>>>>>>>>>>>>>>>>>>                     {write_buffer_size_min, 67108864}, %%
>>>>>>>>>>>>>>>>>> 64
>>>>>>>>>>>>>>>>>> MB
>>>>>>>>>>>>>>>>>> in bytes
>>>>>>>>>>>>>>>>>>                     {write_buffer_size_max, 134217728}, %%
>>>>>>>>>>>>>>>>>> 128 MB
>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>>>>                     {max_open_files, 4000}
>>>>>>>>>>>>>>>>>>                    ]},
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the ring size is 1024 and the machines have 48GB of
>>>>>>>>>>>>>>>>>> memory.
>>>>>>>>>>>>>>>>>> Concerning
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> params from vm.args:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -env ERL_MAX_PORTS 4096
>>>>>>>>>>>>>>>>>> -env ERL_MAX_ETS_TABLES 8192
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +P isn't set
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ingo
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 03.04.2013 16:53, schrieb Evan Vigil-McClanahan:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For your prior mail, I thought that someone had answered.
>>>>>>>>>>>>>>>>>>> Our
>>>>>>>>>>>>>>>>>>> initial
>>>>>>>>>>>>>>>>>>> suggestion was to add +swt very_low to your vm.args, as
>>>>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>> setting the +zdbbl setting that Jon recommended in the
>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>> post
>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>> pointed to.  If those help, moving to 1.3 should help
>>>>>>>>>>>>>>>>>>> more.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Other limits in vm.args that can cause problems are +P,
>>>>>>>>>>>>>>>>>>> ERL_MAX_PORTS,
>>>>>>>>>>>>>>>>>>> and  ERL_MAX_ETS_TABLES.  Are any of these set?  If so,
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> what?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Can you also pate the eleveldb section of your
>>>>>>>>>>>>>>>>>>> app.config?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 7:41 AM, Ingo Rockel
>>>>>>>>>>>>>>>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Evan,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm not sure, I find a lot of these:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2013-03-30 23:27:52.992 [error]
>>>>>>>>>>>>>>>>>>>> <0.8036.323>@riak_api_pb_server:handle_info:141
>>>>>>>>>>>>>>>>>>>> Unrecognized
>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>> {22243034,{error,timeout}}
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> and some of these at the same time one of the kind below
>>>>>>>>>>>>>>>>>>>> gets
>>>>>>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>>>>>>> (although the one has a different time stamp):
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2013-03-30 23:27:53.056 [error]
>>>>>>>>>>>>>>>>>>>> <0.9457.323>@riak_kv_console:status:178
>>>>>>>>>>>>>>>>>>>> Status failed error:terminated
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ingo
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 03.04.2013 16:24, schrieb Evan Vigil-McClanahan:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Resending to the list:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ingo,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> That is an indication that the protocol buffers server
>>>>>>>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>>>>>>> spawn a
>>>>>>>>>>>>>>>>>>>>> put fsm, which means that a put cannot be done for some
>>>>>>>>>>>>>>>>>>>>> reason or
>>>>>>>>>>>>>>>>>>>>> another.  Are there any other messages that appear
>>>>>>>>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>> that might indicate why?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:09 AM, Ingo Rockel
>>>>>>>>>>>>>>>>>>>>> <ingo.rockel at bluelionmobile.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> we have some performance issues with our riak cluster,
>>>>>>>>>>>>>>>>>>>>>> from time
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> have a sudden drop in performance (already asked the
>>>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>> this,
>>>>>>>>>>>>>>>>>>>>>> no-one
>>>>>>>>>>>>>>>>>>>>>> had an idea though). Although not the same time but on
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> problematic
>>>>>>>>>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>>>>>>>>>> we have a lot of these messages from time to time:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2013-04-02 21:41:11.173 [warning] <0.25646.475> ** Can
>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>> proc_lib:init_p
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ,[<0.14556.474>,[<0.9519.474>,riak_api_pb_sup,riak_api_sup,<0.1291.0>],riak_kv_p
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ut_fsm,start_link,[{raw,65032165,<0.9519.474>},{r_object,<<109>>,<<77,115,124,49
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ,53,55,57,56,57,56,50,124,49,51,54,52,57,51,49,54,49,49,53,49,50,52,53,54>>,[{r_
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> content,{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},<<>>}],[],{dict,2,16,16,8,8
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 0,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ,[],[],[[<<99,111,110,116,101,110,116,45,116,121,112,101>>,97,112,112,108,105,99
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ,97,116,105,111,110,47,106,115,111,110]],[],[],[],[],[[<<99,104,97,114,115,101,1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 16>>,85,84,70,45,56]]}}},<<123,34,115,116,34,58,50,44,34,116,34,58,49,44,34,99,3
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 4,58,34,66,117,116,32,115,104,101,32,105,115,32,103,111,110,101,44,32,110,32,101
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ,118,101,110,32,116,104,111,117,103,104,32,105,109,32,110,111,116,32,105,110,32,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 117,114,32,99,105,116,121,32,105,32,108,111,118,101,32,117,32,110,100,32,105,32,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 109,101,97,110,32,105,116,32,58,39,40,34,44,34,114,34,58,49,52,51,52,54,52,51,57
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ,44,34,115,34,58,49,53,55,57,56,57,56,50,44,34,99,116,34,58,49,51,54,52,57,51,49
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ,54,49,49,53,49,50,44,34,97,110,34,58,102,97,108,115,101,44,34,115,107,34,58,49,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 51,54,52,57,51,49,54,49,49,53,49,50,52,53,54,44,34,115,117,34,58,48,125>>},[{tim
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> eout,60000}]]] on 'riak at 172.22.3.12' **
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Can anyone explain to me what these messages mean and
>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>> something about it? Could these messages be in any way
>>>>>>>>>>>>>>>>>>>>>> related
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> performance issues?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Ingo
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>
>>>
>>>
>>> --
>>> Software Architect
>>>
>>> Blue Lion mobile GmbH
>>> Tel. +49 (0) 221 788 797 14
>>> Fax. +49 (0) 221 788 797 19
>>> Mob. +49 (0) 176 24 87 30 89
>>>
>>> ingo.rockel at bluelionmobile.com
>>>>>>
>>>>>> qeep: Hefferwolf
>>>
>>>
>>> www.bluelionmobile.com
>>> www.qeep.net
>
>
>
> --
> Software Architect
>
> Blue Lion mobile GmbH
> Tel. +49 (0) 221 788 797 14
> Fax. +49 (0) 221 788 797 19
> Mob. +49 (0) 176 24 87 30 89
>
> ingo.rockel at bluelionmobile.com
>>>> qeep: Hefferwolf
>
> www.bluelionmobile.com
> www.qeep.net




More information about the riak-users mailing list