repair-2i stops with "bad argument in call to eleveldb:async_write"

Effenberg, Simon seffenberg at team.mobile.de
Wed Sep 3 10:28:03 EDT 2014


I changed now in the code the timeout from 5 to 60mins on one node..
after 23mins the repair-2i was able to continue:

2014-09-03 12:58:06.913 UTC [info] <0.10345.8>@riak_kv_2i_aae:repair_partition:257 Acquired lock on partition 548063113999088594326381812268606132370974703616
2014-09-03 12:58:06.913 UTC [info] <0.10345.8>@riak_kv_2i_aae:repair_partition:259 Repairing indexes in partition 548063113999088594326381812268606132370974703616
2014-09-03 12:58:06.924 UTC [info] <0.10345.8>@riak_kv_2i_aae:create_index_data_db:324 Creating temporary database of 2i data in /var/lib/riak/anti_entropy/2i/tmp_db
2014-09-03 12:58:06.928 UTC [info] <0.10345.8>@riak_kv_2i_aae:create_index_data_db:361 Grabbing all index data for partition 548063113999088594326381812268606132370974703616
2014-09-03 13:25:23.946 UTC [info] <0.10345.8>@riak_kv_2i_aae:create_index_data_db:375 Grabbed 12961170 index data entries from partition 548063113999088594326381812268606132370974703616
2014-09-03 13:25:23.946 UTC [info] <0.10345.8>@riak_kv_2i_aae:build_tmp_tree:448 Building tree for 2i data on disk for partition 548063113999088594326381812268606132370974703616
2014-09-03 13:29:13.478 UTC [info] <0.10345.8>@riak_kv_2i_aae:build_tmp_tree:478 Done building temporary tree for 2i data with 9258332 entries
2014-09-03 13:29:13.478 UTC [info] <0.10345.8>@riak_kv_2i_aae:do_exchange:496 Reconciling 2i data
..... (still running)

Maybe this works.. maybe it will break.. I'm looking forward.

Cheers
Simon

On Mon, Aug 11, 2014 at 08:24:44AM +0000, Effenberg, Simon wrote:
> Hi,
> 
> any updates on this issue? I'm still able to search a range of 2i and
> I'm getting 3 results.. 0, 557 and 13853 :(..
> 
> I cannot rely on 2i right now nor can I repair it.
> 
> Cheers
> Simon
> 
> On Fri, Aug 08, 2014 at 07:12:58AM +0000, Effenberg, Simon wrote:
> > Hi Bryan,
> > 
> > thanks for this. I tried it but to be honest I cannot see any specific
> > stuff in the logs (on the specific host).
> > 
> > I attached the logfile from the specific node. If you think it is
> > also/more important to look into the logfiles on the other nodes I can
> > send them as well.. but a quick look into all of them (searching for
> > "2i" and "index") didn't show anything unusual.. the only stuff was
> > 
> > 2014-08-07 05:44:11.298 UTC [debug] <0.969.0>@riak_kv_index_hashtree:handle_call:240 Updating tree: (vnode)=633697975561446187189878970435575840553939501056 (preflist)={610862012478150829092946394924383918371815555072,12}
> > 
> > and searching for errors didn't show more than you see in the attached
> > files:
> > 
> > $ for host in kriak46-{1..7} kriak47-{1..6}; do echo $host; ssh $host "grep '^2014-08-07 05' /var/log/riak/console.log | grep -i error" ; done
> > kriak46-1
> > 2014-08-07 05:38:28.596 UTC [error] <0.8949.566> ** Node 'c_24556_riak at 10.46.109.201' not responding **
> > 2014-08-07 05:42:36.197 UTC [error] <0.24823.566> ** Node 'c_26945_riak at 10.46.109.201' not responding **
> > 2014-08-07 05:43:16.213 UTC [error] <0.26434.566> ** Node 'c_27071_riak at 10.46.109.201' not responding **
> > 2014-08-07 05:48:14.284 UTC [error] <0.1697.0> gen_server <0.1697.0> terminated with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155
> > 2014-08-07 05:48:14.284 UTC [error] <0.1697.0> CRASH REPORT Process <0.1697.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
> > 2014-08-07 05:48:14.284 UTC [error] <0.1692.0> Supervisor {<0.1692.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1697.0> exit with reason bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
> > 2014-08-i7 05:50:11.390 UTC [error] <0.20983.567> ** Node 'c_32188_riak at 10.46.109.201' not responding **
> > kriak46-2
> > kriak46-3
> > kriak46-4
> > kriak46-5
> > kriak46-6
> > kriak46-7
> > kriak47-1
> > kriak47-2
> > kriak47-3
> > kriak47-4
> > kriak47-5
> > kriak47-6
> > 
> > You mentioned the partition repair stuff.. do you think I need to try
> > out the full repair? Is this maybe a way to fix it? Because it is quiet
> > hard to do this on the cluster (~15 TB of data with AAE stuff and
> > tombstones and maybe ~10 TB without tombstones and AAE stuff) and I
> > don't want to start doing this if it won't help.
> > 
> > Cheers
> > Simon
> > 
> > On Wed, Aug 06, 2014 at 01:08:36PM +0100, bryan hunt wrote:
> > > Simon,
> > > 
> > > If you want to get more verbose logging information, you could perform the following to change the logging level, to debug, then run `repair-2i`, and finally switching back to the normal logging level.
> > > 
> > > - `riak attach`
> > > - `(riak at nodename)1> SetDebug = fun() -> {node(), lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", debug)} end.`
> > > - `(riak at nodename)2> rp(rpc:multicall(erlang, apply, [SetDebug,[]])).`
> > > (don't forget the period at the end of these statements)
> > > - Hit CTRL+C twice to quit from the node
> > > 
> > > You can then revert back to the normal `info` logging level by running the following command via `riak attach`:
> > > 
> > > - `riak attach`
> > > - `(riak at nodename)1> SetInfo = fun() -> {node(), lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", info)} end.`
> > > - `(riak at nodename)2> rp(rpc:multicall(erlang, apply, [SetInfo,[]])).`
> > > (don't forget the period at the end of these statements)
> > > - Hit CTRL+C twice to quit from a the node
> > > 
> > > Please also see the docs for info on `riak attach` monitoring of repairs.
> > > 
> > > http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Monitoring-Repairs
> > > 
> > > Repairs can also be monitored using the `riak-admin transfers` command.
> > > 
> > > http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Running-a-Repair
> > > 
> > > Best Regards,
> > > 
> > > Bryan Hunt 
> > > 
> > > Bryan Hunt - Client Services Engineer - Basho Technologies Limited - Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> > > 
> 
...
> > 	Total partitions: 1
> > 	Finished partitions: 1
> > 	Speed: 100
> > 	Total 2i items scanned: 0
> > 	Total tree objects: 0
> > 	Total objects fixed: 0
> > With errors:
> > Partition: 319703483166135013357056057156686910549735243776
> > Error: index_scan_timeout
> > 
> > 
> > 2014-08-07 05:48:14.284 UTC [error] <0.1697.0> CRASH REPORT Process <0.1697.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
> > 2014-08-07 05:48:14.284 UTC [error] <0.1692.0> Supervisor {<0.1692.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1697.0> exit with reason bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
> 
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> -- 
> Simon Effenberg | Site Op | mobile.international GmbH
> 
> Phone:    + 49. 30. 8109. 7173
> M-Phone:  + 49. 151. 5266. 1558
> Mail:     seffenberg at team.mobile.de
> Web:      www.mobile.de
> 
> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> 
> ______________________________________________________
> Geschäftsführer: Malte Krüger
> HRB Nr.: 18517 P, Amtsgericht Potsdam
> Sitz der Gesellschaft: Kleinmachnow
> ______________________________________________________
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-- 
Simon Effenberg | Site Op | mobile.international GmbH

Phone:    + 49. 30. 8109. 7173
M-Phone:  + 49. 151. 5266. 1558
Mail:     seffenberg at team.mobile.de
Web:      www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany

______________________________________________________
Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow
______________________________________________________


More information about the riak-users mailing list