Map phase timeout
matt.black at jbadigital.com
Mon Apr 8 19:20:31 EDT 2013
Huge thanks for your replies. It seems to me that our approach with
MapReduce queries has been fundamentally wrong, and that I should rewrite
my backup script to use sequential GETs. Currently we're on the bitcask
backend, and on our roadmap is a move over to eleveldb and the application
of appropriate 2i across the whole dataset. Looks like that will be the
next step - before doing any backup of old data.
On 9 April 2013 01:01, Dmitri Zagidulin <dzagidulin at basho.com> wrote:
> My recommendation to you is - don't use MapReduce for this use case. Fetch
> the objects via regular Riak GETs (using connection pooling and
> multithreading, preferably).
> I'm assuming that you have a list of keys (either by keeping track of them
> externally to Riak, or via a Secondary Index query or a Search query), and
> you want to back up those objects.
> The natural inclination, once you know the keys, is to want to fetch all
> of those objects via a single query, and MapReduce immediately comes to
> easier and more familiar than in Erlang). Unfortunately, as Christian
> mentioned, it's very easy for the JS VMs to run out of resources and crash
> or time out. In addition, I've found that rewriting the MapReduce in Erlang
> affords only a bit more resources -- once you hit a certain number of keys
> that you want to fetch, or a certain object size threshold, even Erlang MR
> jobs can time out (keep in mind, while the Map phase can happen in parallel
> on all of the nodes in a cluster, all the object values have to be
> serialized on the single coordinating node, which becomes the bottleneck).
> The workaround for this, even though it might seem counter-intuitive, is
> -- if you know the list of keys, fetch them using GETs. Even a naive
> single-threaded "while loop" way of fetching the objects can often be
> faster than a MapReduce job (for this use case), and it doesn't time out.
> Add to that connection-pooling and multiple worker threads, and this method
> is invariably faster.
> On Mon, Apr 8, 2013 at 4:27 AM, Christian Dahlqvist <christian at basho.com>wrote:
>> Hi Matt,
>> If you have a complicated mapreduce job containing multiple phases
>> configuration , you may find that you will need a very large pool size
>> in order to properly support your job, especially for map phases as these
>> run in parallel.
>> The best way to speed up the mapreduce job and get around the VM pool
>> contention is to implement the mapreduce functions in Erlang.
>> Best regards,
>> Christian Dahlqvist
>> Client Services Engineer
>> Basho Technologies
>> EMEA Office
>> E-mail: christian at basho.com
>> Skype: c.dahlqvist
>> Mobile: +44 7890 590 910
>> On 8 Apr 2013, at 08:20, Matt Black <matt.black at jbadigital.com> wrote:
>> Thanks for the reply, Christian.
>> I didn't explain well enough in my first post - the map reduce operation
>> is merely loading a bunch of objects, and a Python script which makes the
>> connection to Riak then will write these objects to disk. (It's probably
>> The query itself has many map phases where a composite object is built up
>> from related objects spread across many buckets.
>> I was hoping there may be some kind of timeout I could adjust on a
>> per-map phase basis - clutching at straws really.
>> On 8 April 2013 17:14, Christian Dahlqvist <christian at basho.com> wrote:
>>> Without having access to the mapreduce functions you are running, I
>>> would assume that a mapreduce job both writing data to disk as well as
>>> deleting the written record from Riak might be quite slow. This is not
>>> really a use case mapreduce was designed for, and when a mapreduce job
>>> crashes or times out it is difficult to know how far along the processing
>>> of different records it got.
>>> I would therefore recommend considering running this type of archiving
>>> and delete job as an external batch process instead as it will give you
>>> better control over the execution and avoid timeout problems.
>>> Best regards,
>>> On 8 Apr 2013, at 00:49, Matt Black <matt.black at jbadigital.com> wrote:
>>> > Dear list,
>>> > I'm currently getting a timeout during a single phase of a multi-phase
>>> map reduce query. Is there anything I can do to assist this in running?
>>> > It's purpose is to backup and remove objects from Riak, so it will run
>>> periodically during quiet times moving old data out of Riak into file
>>> > Traceback (most recent call last):
>>> > File "./tools/rolling_backup.py", line 185, in <module>
>>> > main()
>>> > File "./tools/rolling_backup.py", line 181, in main
>>> > args.func(**kwargs)
>>> > File "/srv/backup/tools/mapreduce.py", line 295, in do_map_reduce
>>> > raise e
>>> > Exception:
>>> > _______________________________________________
>>> > riak-users mailing list
>>> > riak-users at lists.basho.com
>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> riak-users mailing list
>> riak-users at lists.basho.com
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users