Upgrade to 1.1.2 problems

Phil Sorber phil at omniti.com
Fri Apr 27 14:40:03 EDT 2012


Joe,

Confirmed that was the problem. Made that change and the rolling
upgrade went smoothly. Thanks for your help!

On Thu, Apr 26, 2012 at 10:18 AM, Phil Sorber <phil at omniti.com> wrote:
> Joe,
>
> I think you hit the nail on the head. We install from your released
> deb's, but we deploy via chef and have the config generated from a
> template. I am going to set up another test this morning and I will
> let you know how it goes.
>
> Thanks.
>
> On Wed, Apr 25, 2012 at 7:38 PM, Joseph Blomstedt <joe at basho.com> wrote:
>> Phil,
>>
>> Given the following lines,
>>
>>>         {reason,
>>>          {{badmatch,{'EXIT',noproc}},
>>>           [{riak_core_vnode_proxy,call,2},
>>
>> I'm inclined to think this is an issue with the new vnode routing
>> layer introduced in Riak 1.1. On your Riak 1.1.2 node, make sure your
>> app.config does not contain {legacy_vnode_routing, false} in the
>> riak_core section. If it does, either delete the line or change it to
>> false. If this was the issue, you should set it back to false after
>> upgrading the rest of your cluster.
>>
>> Also, if this turns out to be the issue, could you please let me know
>> how you performed the upgrade? Our official packages are supposed to
>> be designed so that they retain your old app.config file when
>> installed over an existing Riak installation (and thus, this line
>> would have been missing from the 1.0.3 config). It would be useful to
>> know if you used an official package and the app.config was
>> overwritten, or if you manually setup your app.config and simply
>> missed this option.
>>
>> On the plus side, the next release of Riak should include built-in
>> capability negotiation that should remove the need for users to have
>> to manually deal with legacy, mapred, listkeys, etc settings during an
>> upgrade.
>>
>> -Joe
>>
>> On Wed, Apr 25, 2012 at 2:05 PM, Phil Sorber <phil at omniti.com> wrote:
>>> We have a 4 node riak 1.0.0 cluster running in production that we want
>>> to upgrade to 1.1.2. We set up a test environment that closely mimic's
>>> the production one. As close as we possibly can with ec2 hosts. First
>>> attempt to jump from 1.0.0 -> 1.1.2 failed. We took into account the
>>> mapred_system issue and the listkeys_backpressure issue. We decided to
>>> try 1.0.0 -> 1.0.3 since that would involve the mapred_system issue
>>> only. That upgrade worked. We then tried to upgrade 1.0.3 -> 1.1.2 and
>>> had similar problems. Details below.
>>>
>>> --
>>> # riak-admin transfers
>>> Attempting to restart script through sudo -u riak
>>> 'riak at 50.16.31.226' waiting to handoff 14 partitions
>>> --
>>>
>>> Sometimes this would show as many as 48 transfers. Always from the
>>> node that we upgraded. It would eventually show no transfers left. The
>>> upgrade from 1.0.0 -> 1.0.3 didn't do this.
>>>
>>> We tested a link walking query that is similar to what we run in
>>> production. On 2 of the 3 nodes still running 1.0.3 it worked fine. On
>>> the 3rd node, this happened:
>>>
>>> Curl run on 1.0.3 node:
>>>
>>> --
>>> curl -v http://localhost:8098/riak/email_address/riakupgrade@gmail.com/_,_,1
>>> * About to connect() to localhost port 8098 (#0)
>>> *   Trying 127.0.0.1... connected
>>> * Connected to localhost (127.0.0.1) port 8098 (#0)
>>>> GET /riak/email_address/riakupgrade at gmail.com/_,_,1 HTTP/1.1
>>>> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
>>>> Host: localhost:8098
>>>> Accept: */*
>>>>
>>> < HTTP/1.1 500 Internal Server Error
>>> < Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
>>> < Expires: Wed, 25 Apr 2012 20:55:30 GMT
>>> < Date: Wed, 25 Apr 2012 20:45:30 GMT
>>> < Content-Type: text/html
>>> < Content-Length: 2068
>>> <
>>> <html><head><title>500 Internal Server
>>> Error</title></head><body><h1>Internal Server Error</h1>The server
>>> encountered an error while processing this request:<br><pre>{error,
>>>  {error,
>>>  {badmatch,
>>>   {eoi,[],
>>>    [{{reduce,0},
>>>      {trace,
>>>       [error],
>>>       {error,
>>>        [{module,riak_kv_w_reduce},
>>>         {partition,1438665674247607560106752257205091097473808596992},
>>>         {details,
>>>          [{fitting,
>>>            {fitting,<0.848.0>,#Ref<0.0.0.5472>,
>>>             #Fun<riak_kv_mrc_pipe.3.19126064>,1}},
>>>           {name,{reduce,0}},
>>>           {module,riak_kv_w_reduce},
>>>           {arg,{rct,#Fun<riak_kv_mapreduce.reduce_set_union.2>,none}},
>>>           {output,
>>>            {fitting,<0.847.0>,#Ref<0.0.0.5472>,
>>>             #Fun<riak_kv_mrc_pipe.1.120571329>,
>>>             #Fun<riak_kv_mrc_pipe.2.112900629>}},
>>>           {options,
>>>            [{sink,{fitting,<0.119.0>,#Ref<0.0.0.5472>,sink,undefined}},
>>>             {log,sink},
>>>             {trace,
>>>              {set,1,16,16,8,80,48,
>>>               {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>>               {{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}}]},
>>>           {q_limit,64}]},
>>>         {reason,
>>>          {{badmatch,{'EXIT',noproc}},
>>>           [{riak_core_vnode_proxy,call,2},
>>>            {riak_pipe_vnode,queue_work_send,4},
>>>            {riak_pipe_vnode,queue_work_erracc,6},
>>>            {riak_kv_w_reduce,'-done/1-lc$^0/1-0-',3},
>>>            {riak_kv_w_reduce,done,1},
>>>            {riak_pipe_vnode_worker,wait_for_input,2},
>>>            {gen_fsm,handle_msg,7},
>>>            {proc_lib,init_p_do_apply,3}]}},
>>>         {state,{working,done}}]}}}]}},
>>>  [{riak_kv_mrc_pipe,collect_outputs,3},
>>>   {riak_kv_wm_link_walker,execute_segment,3},
>>>   {riak_kv_wm_link_walker,execute_query,3},
>>>   {riak_kv_wm_link_walker,to_multipart_mixed,2},
>>>   {webmachine_resource,resource_call,3},
>>>   {webmachine_resource,do,3},
>>>   {webmachine_decision_core,resource_call,1},
>>>   {webmachine_decision_core,decision,1}]}}</pre><P><HR><ADDRESS>mochiweb+webmachine
>>> web server</AD* Connection #0 to host localhost left intact
>>> * Closing connection #0
>>> --
>>>
>>> After that, this appeared in the error.log on that node:
>>>
>>> error.log on 1.0.3 node:
>>>
>>> --
>>> 2012-04-25 20:45:30.373 [error] <0.119.0> webmachine error:
>>> path="/riak/email_address/riakupgrade at gmail.com/_,_,1"
>>> {error,{error,{badmatch,{eoi,[],[{{reduce,0},{trace,[error],{error,[{module,riak_kv_w_reduce},{partition,1438665674247607560106752257205091097473808596992},{details,[{fitting,{fitting,<0.848.0>,#Ref<0.0.0.5472>,#Fun<riak_kv_mrc_pipe.3.19126064>,1}},{name,{reduce,0}},{module,riak_kv_w_reduce},{arg,{rct,#Fun<riak_kv_mapreduce.reduce_set_union.2>,none}},{output,{fitting,<0.847.0>,#Ref<0.0.0.5472>,#Fun<riak_kv_mrc_pipe.1.120571329>,#Fun<riak_kv_mrc_pipe.2.112900629>}},{options,[{sink,{fitting,<0.119.0>,#Ref<0.0.0.5472>,sink,undefined}},{log,sink},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}}]},{q_limit,64}]},{reason,{{badmatch,{'EXIT',noproc}},[{riak_core_vnode_proxy,call,2},{riak_pipe_vnode,queue_work_send,4},{riak_pipe_vnode,queue_work_erracc,6},{riak_kv_w_reduce,'-done/1-lc$^0/1-0-',3},{riak_kv_w_reduce,done,1},{riak_pipe_vnode_worker,wait_for_input,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]}},{state,{working,done}}]}}}]}},[{riak_kv_mrc_pipe,collect_outputs,3},{riak_kv_wm_link_walker,execute_segment,3},{riak_kv_wm_link_walker,execute_query,3},{riak_kv_wm_link_walker,to_multipart_mixed,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3},{webmachine_decision_core,resource_call,1},{webmachine_decision_core,decision,1}]}}
>>> --
>>>
>>> And this was in the error.log on the 1.1.2 node:
>>>
>>> error.log on 1.1.2 node:
>>>
>>> --
>>> 2012-04-25 20:45:30.234 [error] <0.1710.0> gen_fsm <0.1710.0> in state
>>> wait_for_input terminated with reason: no match of right hand value
>>> {'EXIT',noproc} in riak_core_vnode_proxy:call/2
>>> 2012-04-25 20:45:30.243 [error] <0.1710.0> CRASH REPORT Process
>>> <0.1710.0> with 0 neighbours crashed with reason: no match of right
>>> hand value {'EXIT',noproc} in riak_core_vnode_proxy:call/2
>>> 2012-04-25 20:45:30.245 [error] <0.331.0> Supervisor
>>> riak_pipe_vnode_worker_sup had child undefined started with
>>> {riak_pipe_vnode_worker,start_link,undefined} at <0.1710.0> exit with
>>> reason no match of right hand value {'EXIT',noproc} in
>>> riak_core_vnode_proxy:call/2 in context child_terminated
>>> --
>>>
>>> On the 1.1.2 node, running the same curl, it returned a 404 after a
>>> long timeout:
>>>
>>> Curl run on the 1.1.2 node:
>>>
>>> --
>>> curl -v http://localhost:8098/riak/email_address/riakupgrade@gmail.com/_,_,1
>>> * About to connect() to localhost port 8098 (#0)
>>> *   Trying 127.0.0.1... connected
>>> * Connected to localhost (127.0.0.1) port 8098 (#0)
>>>> GET /riak/email_address/riakupgrade at gmail.com/_,_,1 HTTP/1.1
>>>> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
>>>> Host: localhost:8098
>>>> Accept: */*
>>>>
>>> < HTTP/1.1 404 Object Not Found
>>> < Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
>>> < Date: Wed, 25 Apr 2012 20:49:40 GMT
>>> < Content-Type: text/html
>>> < Content-Length: 193
>>> <
>>> * Connection #0 to host localhost left intact
>>> * Closing connection #0
>>> <HTML><HEAD><TITLE>404 Not Found</TITLE></HEAD><BODY><H1>Not
>>> Found</H1>The requested document was not found on this
>>> server.<P><HR><ADDRESS>mochiweb+webmachine web
>>> server</ADDRESS></BODY></HTML>
>>> --
>>>
>>> There was nothing more in any error logs after this. Also pulling up
>>> direct keys fails on this node. Riak is installed from the debian
>>> packages. I can send you whatever other info is neccesary. We tried
>>> many different combinations, but I think this one is the most correct
>>> and produced the most useful error messages. Any help is appreciated.
>>>
>>> Thanks.
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>> --
>> Joseph Blomstedt <joe at basho.com>
>> Software Engineer
>> Basho Technologies, Inc.
>> http://www.basho.com/




More information about the riak-users mailing list