repair-2i stops with "bad argument in call to eleveldb:async_write"

Effenberg, Simon seffenberg at team.mobile.de
Fri Aug 8 03:12:58 EDT 2014


Hi Bryan,

thanks for this. I tried it but to be honest I cannot see any specific
stuff in the logs (on the specific host).

I attached the logfile from the specific node. If you think it is
also/more important to look into the logfiles on the other nodes I can
send them as well.. but a quick look into all of them (searching for
"2i" and "index") didn't show anything unusual.. the only stuff was

2014-08-07 05:44:11.298 UTC [debug] <0.969.0>@riak_kv_index_hashtree:handle_call:240 Updating tree: (vnode)=633697975561446187189878970435575840553939501056 (preflist)={610862012478150829092946394924383918371815555072,12}

and searching for errors didn't show more than you see in the attached
files:

$ for host in kriak46-{1..7} kriak47-{1..6}; do echo $host; ssh $host "grep '^2014-08-07 05' /var/log/riak/console.log | grep -i error" ; done
kriak46-1
2014-08-07 05:38:28.596 UTC [error] <0.8949.566> ** Node 'c_24556_riak at 10.46.109.201' not responding **
2014-08-07 05:42:36.197 UTC [error] <0.24823.566> ** Node 'c_26945_riak at 10.46.109.201' not responding **
2014-08-07 05:43:16.213 UTC [error] <0.26434.566> ** Node 'c_27071_riak at 10.46.109.201' not responding **
2014-08-07 05:48:14.284 UTC [error] <0.1697.0> gen_server <0.1697.0> terminated with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155
2014-08-07 05:48:14.284 UTC [error] <0.1697.0> CRASH REPORT Process <0.1697.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
2014-08-07 05:48:14.284 UTC [error] <0.1692.0> Supervisor {<0.1692.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1697.0> exit with reason bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
2014-08-i7 05:50:11.390 UTC [error] <0.20983.567> ** Node 'c_32188_riak at 10.46.109.201' not responding **
kriak46-2
kriak46-3
kriak46-4
kriak46-5
kriak46-6
kriak46-7
kriak47-1
kriak47-2
kriak47-3
kriak47-4
kriak47-5
kriak47-6

You mentioned the partition repair stuff.. do you think I need to try
out the full repair? Is this maybe a way to fix it? Because it is quiet
hard to do this on the cluster (~15 TB of data with AAE stuff and
tombstones and maybe ~10 TB without tombstones and AAE stuff) and I
don't want to start doing this if it won't help.

Cheers
Simon

On Wed, Aug 06, 2014 at 01:08:36PM +0100, bryan hunt wrote:
> Simon,
> 
> If you want to get more verbose logging information, you could perform the following to change the logging level, to debug, then run `repair-2i`, and finally switching back to the normal logging level.
> 
> - `riak attach`
> - `(riak at nodename)1> SetDebug = fun() -> {node(), lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", debug)} end.`
> - `(riak at nodename)2> rp(rpc:multicall(erlang, apply, [SetDebug,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from the node
> 
> You can then revert back to the normal `info` logging level by running the following command via `riak attach`:
> 
> - `riak attach`
> - `(riak at nodename)1> SetInfo = fun() -> {node(), lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", info)} end.`
> - `(riak at nodename)2> rp(rpc:multicall(erlang, apply, [SetInfo,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from a the node
> 
> Please also see the docs for info on `riak attach` monitoring of repairs.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Monitoring-Repairs
> 
> Repairs can also be monitored using the `riak-admin transfers` command.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Running-a-Repair
> 
> Best Regards,
> 
> Bryan Hunt 
> 
> Bryan Hunt - Client Services Engineer - Basho Technologies Limited - Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: debug.log
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140808/9f17c2ac/attachment.log>


More information about the riak-users mailing list