Bad MapReduce job brings the Riak to a screeching halt?

Alexander Sicular siculars at gmail.com
Wed Aug 29 23:43:35 EDT 2012


What's your "ulimit -n" ?

I think you ran out of fd's. I cite the "io error: lock" mumbo jumbo. 

-Alexander

@siculars
http://siculars.posterous.com

Sent from my iRotaryPhone

On Aug 29, 2012, at 23:07, Brad Heller <brad at cloudability.com> wrote:

> Hello Riak world,
> 
> I've been experimenting with migrating some of our OLAP data in to Riak recently. I'm still learning about the…particulars…of Riak, so apologies is the solution to this is obvious or this is an overt n00b question.
> 
> I'm developing on a three-ring Riak cluster on my machine (OSX 10.8.1). I'm primarily using Ruby + Ripple but I've done a lot exploration with Curl too. I'm also using Rekon as a way to peek in to data I'm storing.
> 
> The issue I'm facing: I tried to run an improperly-formatted MapReduce job against a bucket with about 45k keys in it and it seemed to crash Riak. Here's the job itself:
> 
> 1.9.3p194 :065 > puts job.to_json
> {"inputs":{"bucket":"raw_statistics","key_filters":[["starts_with","some_string"],["and",[[["tokenize",":",4]],[["between",1345197700,1345697700,true]]]]]},"query":[{"map":{"language":"javascript","keep":true,"name":"Riak.mapValuesJson"}}]}
> 
> I would expect about 2.5k matches to the map. Here's the output from one of the vnodes' error.log
> 
> 2012-08-29 19:27:52.908 [error] <0.420.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup failed:fitting was gone before startup
> 2012-08-29 19:45:41.739 [error] <0.959.0> gen_fsm <0.959.0> in state active terminated with reason: no match of right hand value {error,{bad_filter,[<<"tokenize">>,<<":">>,4]}} in riak_kv_mapred_filters:'-logical_and/1-fun-0-'/1 line 176 
> 2012-08-29 19:45:41.773 [error] <0.594.0> gen_fsm <0.594.0> in state active terminated with reason: no match of right hand value {error,{bad_filter,[<<"tokenize">>,<<":">>,4]}} in riak_kv_mapred_filters:'-logical_and/1-fun-0-'/1 line 176 
> 2012-08-29 19:45:41.778 [error] <0.594.0> CRASH REPORT Process <0.594.0> with 1 neighbours exited with reason: no match of right hand value {error,{bad_filter,[<<"tokenize">>,<<":">>,4]}} in riak_kv_mapred_filters:'-logical_and/1-fun-0-'/1 line 176 in gen_fsm:terminate/7 line 611 
> 2012-08-29 19:45:41.785 [error] <0.23924.70>@riak_kv_vnode:init:265 Failed to start riak_kv_multi_backend Reason: [{riak_kv_eleveldb_backend,{db_open,"IO error: lock ../../tmp/riak/instance1/leveldb/0/LOCK: Resource temporarily unavailable"}}]
> 2012-08-29 19:45:41.814 [error] <0.141.0> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.594.0> exit with reason no match of right hand value {error,{bad_filter,[<<"tokenize">>,<<":">>,4]}} in riak_kv_mapred_filters:'-logical_and/1-fun-0-'/1 line 176 in context child_terminated
> 2012-08-29 19:45:41.818 [error] <0.959.0> CRASH REPORT Process <0.959.0> with 1 neighbours exited with reason: no match of right hand value {error,{bad_filter,[<<"tokenize">>,<<":">>,4]}} in riak_kv_mapred_filters:'-logical_and/1-fun-0-'/1 line 176 in gen_fsm:terminate/7 line 611 
> 2012-08-29 19:45:41.822 [error] <0.141.0> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.959.0> exit with reason no match of right hand value {error,{bad_filter,[<<"tokenize">>,<<":">>,4]}} in riak_kv_mapred_filters:'-logical_and/1-fun-0-'/1 line 176 in context child_terminated
> 2012-08-29 19:45:41.943 [error] <0.962.0> gen_fsm <0.962.0> in state ready terminated with reason: no match of right hand value {error,{bad_filter,[<<"tokenize">>,<<":">>,4]}} in riak_kv_mapred_filters:'-logical_and/1-fun-0-'/1 line 176 
> 
> For what it's worth the format of my key is as follows (if anyone has any suggestions on a smarter way to format these, I'm all ears).
> 
> <some piece of user data>:<user ID>:<some other piece of data>:<timestamp in seconds>
> 
> So my question is: Why did this completely kill Riak? This makes me pretty nervous--a bug in our app has the potential to bring down the ring! Is there anything we can do to protect against this?
> 
> And a bonus question: What is a reasonable way to query this? I can't maintain links as there will potentially be hundreds of thousands of these objects to query over (each one is pretty small). Is this a good candidate for a compound secondary index?
> 
> Thanks for any help.
> 
> Cheers,
> 
> Brad Heller | Engineering Lead | Cloudability.com | 541-231-1514 | Skype: brad.heller | @bradhe | @cloudability
> 
> We're hiring! http://cloudability.com/jobs
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120829/5a2980fb/attachment.html>


More information about the riak-users mailing list