Simple mapreduce with 2i returns different result
christian at basho.com
Fri Apr 12 12:03:53 EDT 2013
When Riak executes a MapReduce job, it will determine a covering set of vnodes/partitions that will handle the processing. The vnodes selected will vary between runs, and if the partitions do not all hold the same data, there may be differences in the results between consecutive runs.
- How many records do you have in the bucket?
- How large portion of the results typically differs between runs?
- What does the mapreduce jobs look like?
On 12 Apr 2013, at 11:05, Mattias Sjölinder <mattias at sjolinder.se> wrote:
> Thanks for you response Christian!
> We are having AAE enabled in our cluster. The thing that bothers me most is that even if I get the expected dataset once in my MapReduce, a consecutive identically request is often returning a subset of the result even if no changes have been done to entire bucket. The same thing seems to happen for all our MapReduce queries. The bucket in the example have both allow_mult and last_write_wins set to false.
> 2013/4/12 Christian Dahlqvist <christian at basho.com>
> Hi Mattias,
> MapReduce in Riak executes based on the data in a single partition and does, for efficiency reasons, not perform a quorum read (which greatly reduces the required amount of network traffic). As Riak is eventually consistent, it is possible that all partitions do not hold exactly the same data or version of the data at any point in time. What you are seeing could very well be a result of all replicas of some data not being in sync across all partitions holding a copy.
> This would however be corrected either through read-repair or AAE (Active Anti-Entropy) if you have this enabled. If you were to perform a GET on a key that is missing, triggering read-repair, I would expect it to consistently show up in the results from that point on, at least until it is updated again.
> Best regards,
> On 12 Apr 2013, at 08:13, Mattias Sjölinder <mattias at sjolinder.se> wrote:
>> I struggling to get a grip around MapReduce and why it is sometimes returning only a subset of what is expected. Is it the nodes processing the map phase that after a specific time returning the found matches so far? I would rather have it returning timeout instead of a subset of the actual match.
>> An example is this simple MapReduce:
>> Any thoughts?
>> riak-users mailing list
>> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users