ListKeys or MapReduce

Jeremiah Peschka jeremiah.peschka at gmail.com
Thu Feb 14 10:38:23 EST 2013


Thanks for the insight into this.

---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Thu, Feb 14, 2013 at 4:40 AM, Christian Dahlqvist <christian at basho.com>wrote:

> Hi OJ,
>
> The do_prereduce parameter makes it possible to have the first iteration
> of the reduce phase execute where the preceding map phase generated output.
> This can, as in the example I provided, be used to reduce the amount of
> data that needs to be sent across the network. This is described in greater
> detail here:
> http://docs.basho.com/riak/latest/references/appendices/MapReduce-Implementation/
>
> As it is possible to set it to be enabled by default in the app.config, it
> should be fine to always specify it for reduce phases preceded by a map
> phase.
>
> Best regards,
>
> Christian
>
>
> On 14 Feb 2013, at 12:21, OJ Reeves <oj at buffered.io> wrote:
>
> Chris,
>
> I've never heard of do_prereduce before. What kind of effect does this
> have? That is, if someone were to use it all the time, regardless of the
> amount of data being returned, would this be a bad thing?
>
> Thanks.
> OJ
>
> On Thu, Feb 14, 2013 at 6:19 PM, Christian Dahlqvist <christian at basho.com>wrote:
>
>> Hi,
>>
>> For buckets with a significant number of records, it makes a lot of sense
>> to run the example I provided with 'do_prereduce' enabled as it will result
>> in considerably less data being sent between the nodes. This can be enabled
>> as follows:
>>
>> curl -XPOST http://localhost:8098/mapred
>>   -H 'Content-Type: application/json'
>>   -d '{"inputs":{
>>            "bucket":"goog",
>>            "index":"$bucket",
>>            "key":"goog"
>>        },
>>        "query":[{"reduce":{"language":"erlang",
>>                            "module":"riak_kv_mapreduce",
>>                            "function":"reduce_count_inputs",
>>                            "arg":{"do_prereduce":true}}}]}'
>>
>> Best regards,
>>
>> Christian
>>
>>
>> On 14 Feb 2013, at 08:01, Christian Dahlqvist <christian at basho.com>
>> wrote:
>>
>> Hi Jeremiah,
>>
>> It does indeed not seem to be documented on the main docs site, and I
>> will try to correct this. The only place I have found it described is on
>> the wiki for the Ruby client (
>> https://github.com/basho/riak-ruby-client/wiki/Secondary-Indexes).
>>
>> Below is also an example of a simple mapreduce job that shows how to
>> count the number of records in the 'goog' bucket based on the $bucket
>> secondary index:
>>
>> curl -XPOST http://localhost:8098/mapred
>>   -H 'Content-Type: application/json'
>>   -d '{"inputs":{
>>            "bucket":"goof",
>>            "index":"$bucket",
>>            "key":"goof"
>>        },
>>        "query":[{"reduce":{"language":"erlang",
>>                            "module":"riak_kv_mapreduce",
>>                            "function":"reduce_count_inputs"}}]}'
>>
>> I hope this helps.
>>
>> Best regards,
>>
>> Christian
>>
>>
>> On 13 Feb 2013, at 18:12, Jeremiah Peschka <jeremiah.peschka at gmail.com>
>> wrote:
>>
>> Is this documented anywhere on the docs.basho.com site?
>>
>> Searching for $bucket produces search results just for "bucket" and
>> Google says "No results found for *site:docs.basho.com $bucket*."
>>
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>> MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>>
>>
>> On Wed, Feb 13, 2013 at 10:08 AM, Christian Dahlqvist <
>> christian at basho.com> wrote:
>>
>>> Hi,
>>>
>>> In addition to the $key index, there is also a $bucket index available
>>> by default. This contains the name of the bucket, and can be used to get
>>> all keys in a specific bucket.
>>>
>>> Best regards,
>>>
>>> Christian
>>>
>>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> --
>
> OJ Reeves
> +61 431 952 586
> http://buffered.io/
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130214/752c6bab/attachment.html>


More information about the riak-users mailing list