Bad MapReduce job brings the Riak to a screeching halt?

Bryan Fink bryan at basho.com
Thu Aug 30 08:55:49 EDT 2012


On Wed, Aug 29, 2012 at 11:07 PM, Brad Heller <brad at cloudability.com> wrote:
> The issue I'm facing: I tried to run an improperly-formatted MapReduce job
> against a bucket with about 45k keys in it and it seemed to crash Riak.

…snip…

> So my question is: Why did this completely kill Riak? This makes me pretty
> nervous--a bug in our app has the potential to bring down the ring! Is there
> anything we can do to protect against this?

Hi, Brad. Indeed, you have found a bug around our validation of
keyfilters. I've filed an issue to track it:

https://github.com/basho/riak_kv/issues/387

The short version is that nested keyfilters (those inside and/or/not
clauses) are not validated until execution time. The manner in which
they are executed means that any failures they have happen on each
vnode processing them, so there is quite a bit more error handling and
logging going on.

I don't think this should have "crashed" Riak, though. The query would
have hung until its timeout, and there would have been quite a spew in
the logs, but Riak should have remained running and able to handle
other requests (barring a second problem, possibly related to KV vnode
workers dying like this during a fold operation). Could you share more
details about what you meant by "crash", please?

Thanks for reporting this.

Cheers,
Bryan




More information about the riak-users mailing list