Simple mapreduce with 2i returns different result

Mattias Sjölinder mattias at sjolinder.se
Tue Apr 16 10:52:17 EDT 2013


I have tested with a bucket with as low as 10 documents as well as another
a couple of hundreds, both in the same cluster, and both returning
stochastic numbers of documents in the result. For the bucket with 10
documents the result varies from around 3/10 docs to 10/10 documents and
the MapReduce is looking like this:

{
    "inputs":{
        "bucket":"som-bucket",
        "index":"userid_bin",
        "key":"18481123123"
    },
    "query":[
        {
            "map":{
                "language":"javascript",
                "name":"Riak.mapValuesJson",
                "keep":true
            }
        }
    ]
}

I have also tested without using index but using keyfilters instead but
with the same result.

We are using CorrugatedIron as client for all the Riak communication and
one of my colleagues is discussing with developers of the client lib about
the same issue.


Best regards,
Mattias




2013/4/12 Christian Dahlqvist <christian at basho.com>

> Hi Mattias,
>
> When Riak executes a MapReduce job, it will determine a covering set of
> vnodes/partitions that will handle the processing. The vnodes selected will
> vary between runs, and if the partitions do not all hold the same data,
> there may be differences in the results between consecutive runs.
>
> - How many records do you have in the bucket?
>
> - How large portion of the results typically differs between runs?
>
> - What does the mapreduce jobs look like?
>
> Best regards,
>
> Christian
>
>
>
>
> On 12 Apr 2013, at 11:05, Mattias Sjölinder <mattias at sjolinder.se> wrote:
>
> Thanks for you response Christian!
>
> We are having AAE enabled in our cluster. The thing that bothers me most
> is that even if I get the expected dataset once in my MapReduce, a
> consecutive identically request is often returning a subset of the result
> even if no changes have been done to entire bucket. The same thing seems to
> happen for all our MapReduce queries. The bucket in the example have both
> allow_mult and last_write_wins set to false.
>
> Regards
> Mattias
>
>
> 2013/4/12 Christian Dahlqvist <christian at basho.com>
>
>> Hi Mattias,
>>
>> MapReduce in Riak executes based on the data in a single partition and
>> does, for efficiency reasons, not perform a quorum read (which greatly
>> reduces the required amount of network traffic). As Riak is eventually
>> consistent, it is possible that all partitions do not hold exactly the same
>> data or version of the data at any point in time. What you are seeing could
>> very well be a result of all replicas of some data not being in sync across
>> all partitions holding a copy.
>>
>> This would however be corrected either through read-repair or AAE (Active
>> Anti-Entropy) if you have this enabled. If you were to perform a GET on a
>> key that is missing, triggering read-repair, I would expect it to
>> consistently show up in the results from that point on, at least until it
>> is updated again.
>>
>> Best regards,
>>
>> Christian
>>
>>
>>
>> On 12 Apr 2013, at 08:13, Mattias Sjölinder <mattias at sjolinder.se> wrote:
>>
>> Hi
>>
>> I struggling to get a grip around MapReduce and why it is sometimes
>> returning only a subset of what is expected. Is it the nodes processing the
>> map phase that after a specific time returning the found matches so far? I
>> would rather have it returning timeout instead of a subset of the actual
>> match.
>>
>> An example is this simple MapReduce:
>>
>> {
>>     "inputs":{
>>         "bucket":"som-bucket",
>>         "index":"userid_bin",
>>         "key":"18481123123"
>>     },
>>     "query":[
>>         {
>>             "map":{
>>                 "language":"javascript",
>>                 "name":"Riak.mapValuesJson",
>>                 "keep":true
>>             }
>>         }
>>     ]
>> }
>>
>>
>> Any thoughts?
>>
>> Regards
>> Mattias
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130416/4bdf86c4/attachment.html>


More information about the riak-users mailing list