ListKeys or MapReduce

Christian Dahlqvist christian at basho.com
Thu Feb 14 07:40:06 EST 2013


Hi OJ,

The do_prereduce parameter makes it possible to have the first iteration of the reduce phase execute where the preceding map phase generated output. This can, as in the example I provided, be used to reduce the amount of data that needs to be sent across the network. This is described in greater detail here: http://docs.basho.com/riak/latest/references/appendices/MapReduce-Implementation/

As it is possible to set it to be enabled by default in the app.config, it should be fine to always specify it for reduce phases preceded by a map phase. 

Best regards,

Christian


On 14 Feb 2013, at 12:21, OJ Reeves <oj at buffered.io> wrote:

> Chris,
> 
> I've never heard of do_prereduce before. What kind of effect does this have? That is, if someone were to use it all the time, regardless of the amount of data being returned, would this be a bad thing?
> 
> Thanks.
> OJ
> 
> On Thu, Feb 14, 2013 at 6:19 PM, Christian Dahlqvist <christian at basho.com> wrote:
> Hi,
> 
> For buckets with a significant number of records, it makes a lot of sense to run the example I provided with 'do_prereduce' enabled as it will result in considerably less data being sent between the nodes. This can be enabled as follows:
> 
> curl -XPOST http://localhost:8098/mapred 
>   -H 'Content-Type: application/json' 
>   -d '{"inputs":{
>            "bucket":"goog",
>            "index":"$bucket",
>            "key":"goog"
>        },
>        "query":[{"reduce":{"language":"erlang",
>                            "module":"riak_kv_mapreduce",
>                            "function":"reduce_count_inputs", 
>                            "arg":{"do_prereduce":true}}}]}'
> 
> Best regards,
> 
> Christian
> 
> 
> On 14 Feb 2013, at 08:01, Christian Dahlqvist <christian at basho.com> wrote:
> 
>> Hi Jeremiah,
>> 
>> It does indeed not seem to be documented on the main docs site, and I will try to correct this. The only place I have found it described is on the wiki for the Ruby client (https://github.com/basho/riak-ruby-client/wiki/Secondary-Indexes).
>>  
>> Below is also an example of a simple mapreduce job that shows how to count the number of records in the 'goog' bucket based on the $bucket secondary index:
>> 
>> curl -XPOST http://localhost:8098/mapred 
>>   -H 'Content-Type: application/json' 
>>   -d '{"inputs":{
>>            "bucket":"goof",
>>            "index":"$bucket",
>>            "key":"goof"
>>        },
>>        "query":[{"reduce":{"language":"erlang",
>>                            "module":"riak_kv_mapreduce",
>>                            "function":"reduce_count_inputs"}}]}'
>> 
>> I hope this helps.
>> 
>> Best regards,
>> 
>> Christian
>> 
>> 
>> On 13 Feb 2013, at 18:12, Jeremiah Peschka <jeremiah.peschka at gmail.com> wrote:
>> 
>>> Is this documented anywhere on the docs.basho.com site? 
>>> 
>>> Searching for $bucket produces search results just for "bucket" and Google says "No results found for site:docs.basho.com $bucket."
>>> 
>>> ---
>>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>>> MCITP: SQL Server 2008, MVP
>>> Cloudera Certified Developer for Apache Hadoop
>>> 
>>> 
>>> On Wed, Feb 13, 2013 at 10:08 AM, Christian Dahlqvist <christian at basho.com> wrote:
>>> Hi,
>>> 
>>> In addition to the $key index, there is also a $bucket index available by default. This contains the name of the bucket, and can be used to get all keys in a specific bucket.
>>> 
>>> Best regards,
>>> 
>>> Christian
>>> 
>> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> 
> -- 
> 
> OJ Reeves
> +61 431 952 586
> http://buffered.io/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130214/8efeb2b6/attachment.html>


More information about the riak-users mailing list