repair-2i stops with "bad argument in call to eleveldb:async_write"

Russell Brown russell.brown at me.com
Fri Aug 1 05:24:39 EDT 2014


Hi Simon,
Sorry for the delays. I’m on vacation for a couple of days. Will pick this up on Monday.

Cheers

Russell

On 1 Aug 2014, at 09:56, Effenberg, Simon <seffenberg at team.mobile.de> wrote:

> Hi Russell, @basho
> 
> any updates on this? We still have the issues with 2i (repair is also
> still not possible) and searching for the 2i indexes is reproducable
> creating (for one range I tested) 3 different values.
> 
> I would love to provide anything you need to debug that issue.
> 
> Cheers
> Simon
> 
> On Wed, Jul 30, 2014 at 09:22:56AM +0000, Effenberg, Simon wrote:
>> Great. Thanks Russell..
>> 
>> if you need me to do something.. feel free to ask.
>> 
>> Cheers
>> Simon
>> 
>> On Wed, Jul 30, 2014 at 10:19:56AM +0100, Russell Brown wrote:
>>> Thanks Simon,
>>> 
>>> I’m going to spend a some time on this day.
>>> 
>>> Cheers
>>> 
>>> Russell
>>> 
>>> On 30 Jul 2014, at 10:05, Effenberg, Simon <seffenberg at team.mobile.de> wrote:
>>> 
>>>> Hi Russel,
>>>> 
>>>> still one machine out of 13 is on wheezy and the rest on squeeze but the
>>>> software is the same and basho is providing even the erlang stuff. So
>>>> their should no real difference inside the application.
>>>> 
>>>> And the errors are almost the same (except the async_write/read
>>>> difference).
>>>> 
>>>> I paste them:
>>>> 
>>>> ---------- node 1 -----------
>>>> 
>>>> 2014-07-30 06:16:07.728 UTC [info] <0.14871.336>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
>>>>       Total partitions: 1
>>>>       Finished partitions: 1
>>>>       Speed: 100
>>>>       Total 2i items scanned: 0
>>>>       Total tree objects: 0
>>>>       Total objects fixed: 0
>>>> With errors:
>>>> Partition: 125597796958124469533129165311555572001681702912
>>>> Error: index_scan_timeout
>>>> 
>>>> 
>>>> 2014-07-30 06:16:07.728 UTC [error] <0.1525.0> gen_server <0.1525.0> terminated with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.324.211123>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97
>>>> ,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155
>>>> 2014-07-30 06:16:07.728 UTC [error] <0.1525.0> CRASH REPORT Process <0.1525.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.324.211123>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,11
>>>> 1,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
>>>> 2014-07-30 06:16:07.728 UTC [error] <0.1517.0> Supervisor {<0.1517.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1525.0> exit with reason bad argument in call
>>>> to eleveldb:async_write(#Ref<0.0.324.211123>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
>>>> 
>>>> 
>>>> ---------- node 2 -----------
>>>> 
>>>> 2014-07-30 06:16:07.791 UTC [info] <0.8083.314>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
>>>>       Total partitions: 1
>>>>       Finished partitions: 1
>>>>       Speed: 100
>>>>       Total 2i items scanned: 0
>>>>       Total tree objects: 0
>>>>       Total objects fixed: 0
>>>> With errors:
>>>> Partition: 622279994019798508141412682679979879462877528064
>>>> Error: index_scan_timeout
>>>> 
>>>> 
>>>> 2014-07-30 06:16:07.791 UTC [error] <0.1884.0> gen_server <0.1884.0> terminated with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.318.96628>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,
>>>> 116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155
>>>> 2014-07-30 06:16:07.791 UTC [error] <0.1884.0> CRASH REPORT Process <0.1884.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.318.96628>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111
>>>> ,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
>>>> 2014-07-30 06:16:07.792 UTC [error] <0.1875.0> Supervisor {<0.1875.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1884.0> exit with reason bad argument in call
>>>> to eleveldb:async_write(#Ref<0.0.318.96628>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
>>>> 
>>>> ---------- node 3 -----------
>>>> 
>>>> 2014-07-30 06:17:42.679 UTC [info] <0.15746.299>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
>>>>       Total partitions: 1
>>>>       Finished partitions: 1
>>>>       Speed: 100
>>>>       Total 2i items scanned: 0
>>>>       Total tree objects: 0
>>>>       Total objects fixed: 0
>>>> With errors:
>>>> Partition: 291158529312015815735890337767697007822080311296
>>>> Error: index_scan_timeout
>>>> 
>>>> 
>>>> 2014-07-30 06:17:42.679 UTC [error] <0.975.0> gen_server <0.975.0> terminated with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.2075.159423>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155
>>>> 2014-07-30 06:17:42.679 UTC [error] <0.975.0> CRASH REPORT Process <0.975.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.2075.159423>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
>>>> 2014-07-30 06:17:42.679 UTC [error] <0.969.0> Supervisor {<0.969.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.975.0> exit with reason bad argument in call to eleveldb:async_write(#Ref<0.0.2075.159423>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
>>>> 
>>>> ---------- node 4 -----------
>>>> 
>>>> 2014-07-30 06:16:10.004 UTC [info] <0.28895.382>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
>>>>       Total partitions: 1
>>>>       Finished partitions: 1
>>>>       Speed: 100
>>>>       Total 2i items scanned: 0
>>>>       Total tree objects: 0
>>>>       Total objects fixed: 0
>>>> With errors:
>>>> Partition: 319703483166135013357056057156686910549735243776
>>>> Error: index_scan_timeout
>>>> 
>>>> 
>>>> 2014-07-30 06:16:10.004 UTC [error] <0.1580.0> gen_server <0.1580.0> terminated with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.367.155781>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155
>>>> 2014-07-30 06:16:10.004 UTC [error] <0.1580.0> CRASH REPORT Process <0.1580.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.367.155781>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
>>>> 2014-07-30 06:16:10.005 UTC [error] <0.1570.0> Supervisor {<0.1570.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1580.0> exit with reason bad argument in call to eleveldb:async_write(#Ref<0.0.367.155781>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
>>>> 
>>>> ---------- node 5 -----------
>>>> 
>>>> 2014-07-30 06:16:09.191 UTC [info] <0.15985.355>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
>>>>       Total partitions: 1
>>>>       Finished partitions: 1
>>>>       Speed: 100
>>>>       Total 2i items scanned: 0
>>>>       Total tree objects: 0
>>>>       Total objects fixed: 0
>>>> With errors:
>>>> Partition: 833512652540280570538039006158505159647524028416
>>>> Error: index_scan_timeout
>>>> 
>>>> 
>>>> 2014-07-30 06:16:09.191 UTC [error] <0.1601.0> gen_server <0.1601.0> terminated with reason: bad argument in call to eleveldb:async_get(#Ref<0.0.351.26505>, <<>>, <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, []) in eleveldb:get/3 line 143
>>>> 2014-07-30 06:16:09.191 UTC [error] <0.1601.0> CRASH REPORT Process <0.1601.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_get(#Ref<0.0.351.26505>, <<>>, <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, []) in eleveldb:get/3 line 143 in gen_server:terminate/6 line 747
>>>> 2014-07-30 06:16:09.192 UTC [error] <0.1598.0> Supervisor {<0.1598.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1601.0> exit with reason bad argument in call to eleveldb:async_get(#Ref<0.0.351.26505>, <<>>, <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, []) in eleveldb:get/3 line 143 in context child_terminated
>>>> 
>>>> ---------- node 6 -----------
>>>> 
>>>> 2014-07-30 06:16:09.154 UTC [info] <0.32042.379>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
>>>>       Total partitions: 1
>>>>       Finished partitions: 1
>>>>       Speed: 100
>>>>       Total 2i items scanned: 0
>>>>       Total tree objects: 0
>>>>       Total objects fixed: 0
>>>> With errors:
>>>> Partition: 34253944624943037145398863266787883273185918976
>>>> Error: index_scan_timeout
>>>> 
>>>> 
>>>> 2014-07-30 06:16:09.154 UTC [error] <0.4086.0> gen_server <0.4086.0> terminated with reason: bad argument in call to eleveldb:async_get(#Ref<0.0.2698.198008>, <<>>, <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, []) in eleveldb:get/3 line 143
>>>> 2014-07-30 06:16:09.154 UTC [error] <0.4086.0> CRASH REPORT Process <0.4086.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_get(#Ref<0.0.2698.198008>, <<>>, <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, []) in eleveldb:get/3 line 143 in gen_server:terminate/6 line 747
>>>> 2014-07-30 06:16:09.154 UTC [error] <0.4085.0> Supervisor {<0.4085.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.4086.0> exit with reason bad argument in call to eleveldb:async_get(#Ref<0.0.2698.198008>, <<>>, <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, []) in eleveldb:get/3 line 143 in context child_terminated
>>>> 
>>>> On Wed, Jul 30, 2014 at 09:50:22AM +0100, Russell Brown wrote:
>>>>> Hi Simon, 
>>>>> So the earlier “this is on wheezy, rest are on squeeze” thing is no longer a factor?
>>>>> 
>>>>> Any and all 2i repair you do ends with the same error?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Russell
>>>>> 
>>>>> On 30 Jul 2014, at 07:29, Effenberg, Simon <seffenberg at team.mobile.de> wrote:
>>>>> 
>>>>>> I tried it now with one partition on 6 different machines and everywhere the same result: index_scan_timeout and the info: bad argument in call to eleveldb:async_get (2x) or async_write (4x).
>>>>>> 
>>>>>> 
>>>>>> Von Samsung Mobile gesendet
>>>>>> 
>>>>>> 
>>>>>> -------- Ursprüngliche Nachricht --------
>>>>>> Von: "Effenberg, Simon"
>>>>>> Datum:30.07.2014 07:49 (GMT+01:00)
>>>>>> An: bryan hunt
>>>>>> Cc: riak-users at lists.basho.com
>>>>>> Betreff: AW: repair-2i stops with "bad argument in call to eleveldb:async_write"
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I tried it on two different nodes with one partition each. Both multiple times before the upgrade and after the upgrade.
>>>>>> 
>>>>>> I will try it on other machines in a minute but because I tried it already on two different nodes and one of them is 2 weeks old and stored on a HP 3par I bet that this is not a disk corruption issue..
>>>>>> 
>>>>>> Simon
>>>>>> 
>>>>>> 
>>>>>> Von Samsung Mobile gesendet
>>>>>> 
>>>>>> 
>>>>>> -------- Ursprüngliche Nachricht --------
>>>>>> Von: bryan hunt
>>>>>> Datum:29.07.2014 18:21 (GMT+01:00)
>>>>>> An: "Effenberg, Simon"
>>>>>> Cc: riak-users at lists.basho.com
>>>>>> Betreff: Re: repair-2i stops with "bad argument in call to eleveldb:async_write"
>>>>>> 
>>>>>> Hi Simon,
>>>>>> 
>>>>>> Does the problem persist if you run it again? 
>>>>>> 
>>>>>> Does it happen if you run it against any other partition?
>>>>>> 
>>>>>> Best Regards,
>>>>>> 
>>>>>> Bryan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Bryan Hunt - Client Services Engineer - Basho Technologies Limited - Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
>>>>>> 
>>>>>> On 29 Jul 2014, at 09:35, Effenberg, Simon <seffenberg at team.mobile.de> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> we have some issues with 2i queries like that:
>>>>>>> 
>>>>>>> seffenberg at kriak46-1:~$ while :; do curl -s localhost:8098/buckets/conversation/index/createdat_int/0/23182680 | ruby -rjson -e "o = JSON.parse(STDIN.read); puts o['keys'].size"; sleep 1; done
>>>>>>> 
>>>>>>> 13853
>>>>>>> 13853
>>>>>>> 0
>>>>>>> 557
>>>>>>> 557
>>>>>>> 557
>>>>>>> 13853
>>>>>>> 0
>>>>>>> 
>>>>>>> 
>>>>>>> ...
>>>>>>> 
>>>>>>> So I tried to start a repair-2i first on one vnode/partition on one node
>>>>>>> (which is quiet new in the cluster.. 2 weeks or so).
>>>>>>> 
>>>>>>> The command is failing with the following log entries:
>>>>>>> 
>>>>>>> seffenberg at kriak46-7:~$ sudo riak-admin repair-2i 22835963083295358096932575511191922182123945984
>>>>>>> Will repair 2i on these partitions:
>>>>>>>      22835963083295358096932575511191922182123945984
>>>>>>> Watch the logs for 2i repair progress reports
>>>>>>> seffenberg at kriak46-7:~$ 2014-07-29 08:20:22.729 UTC [info] <0.5929.1061>@riak_kv_2i_aae:init:139 Starting 2i repair at speed 100 for partitions [22835963083295358096932575511191922182123945984]
>>>>>>> 2014-07-29 08:20:22.729 UTC [info] <0.5930.1061>@riak_kv_2i_aae:repair_partition:257 Acquired lock on partition 22835963083295358096932575511191922182123945984
>>>>>>> 2014-07-29 08:20:22.729 UTC [info] <0.5930.1061>@riak_kv_2i_aae:repair_partition:259 Repairing indexes in partition 22835963083295358096932575511191922182123945984
>>>>>>> 2014-07-29 08:20:22.740 UTC [info] <0.5930.1061>@riak_kv_2i_aae:create_index_data_db:324 Creating temporary database of 2i data in /var/lib/riak/anti_entropy/2i/tmp_db
>>>>>>> 2014-07-29 08:20:22.751 UTC [info] <0.5930.1061>@riak_kv_2i_aae:create_index_data_db:361 Grabbing all index data for partition 22835963083295358096932575511191922182123945984
>>>>>>> 2014-07-29 08:25:22.752 UTC [info] <0.5929.1061>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
>>>>>>>      Total partitions: 1
>>>>>>>      Finished partitions: 1
>>>>>>>      Speed: 100
>>>>>>>      Total 2i items scanned: 0
>>>>>>>      Total tree objects: 0
>>>>>>>      Total objects fixed: 0
>>>>>>> With errors:
>>>>>>> Partition: 22835963083295358096932575511191922182123945984
>>>>>>> Error: index_scan_timeout
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-07-29 08:25:22.752 UTC [error] <0.4711.1061> gen_server <0.4711.1061> terminated with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.10120.211816>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155
>>>>>>> 2014-07-29 08:25:22.753 UTC [error] <0.4711.1061> CRASH REPORT Process <0.4711.1061> with 0 neighbours exited with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.10120.211816>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
>>>>>>> 2014-07-29 08:25:22.753 UTC [error] <0.1031.0> Supervisor {<0.1031.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.4711.1061> exit with reason bad argument in call to eleveldb:async_write(#Ref<0.0.10120.211816>, <<>>, [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 line 155 in context child_terminated
>>>>>>> 
>>>>>>> 
>>>>>>> Anything I can do about that? What's the issue here?
>>>>>>> 
>>>>>>> I'm using Riak 1.4.8 (.deb package).
>>>>>>> 
>>>>>>> Cheers
>>>>>>> Simon
>>>>>>> _______________________________________________
>>>>>>> riak-users mailing list
>>>>>>> riak-users at lists.basho.com
>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>> 
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>> 
>>>> 
>>>> -- 
>>>> Simon Effenberg | Site Op | mobile.international GmbH
>>>> 
>>>> Phone:    + 49. 30. 8109. 7173
>>>> M-Phone:  + 49. 151. 5266. 1558
>>>> Mail:     seffenberg at team.mobile.de
>>>> Web:      www.mobile.de
>>>> 
>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>> 
>>>> ______________________________________________________
>>>> Geschäftsführer: Malte Krüger
>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>> Sitz der Gesellschaft: Kleinmachnow
>>>> ______________________________________________________
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>> 
>> -- 
>> Simon Effenberg | Site Op | mobile.international GmbH
>> 
>> Phone:    + 49. 30. 8109. 7173
>> M-Phone:  + 49. 151. 5266. 1558
>> Mail:     seffenberg at team.mobile.de
>> Web:      www.mobile.de
>> 
>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>> 
>> ______________________________________________________
>> Geschäftsführer: Malte Krüger
>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>> Sitz der Gesellschaft: Kleinmachnow
>> ______________________________________________________
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> -- 
> Simon Effenberg | Site Op | mobile.international GmbH
> 
> Phone:    + 49. 30. 8109. 7173
> M-Phone:  + 49. 151. 5266. 1558
> Mail:     seffenberg at team.mobile.de
> Web:      www.mobile.de
> 
> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> 
> ______________________________________________________
> Geschäftsführer: Malte Krüger
> HRB Nr.: 18517 P, Amtsgericht Potsdam
> Sitz der Gesellschaft: Kleinmachnow
> ______________________________________________________
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list