Inconsistent map/reduce results

Dan Reverri dan at basho.com
Thu Mar 31 15:23:35 EDT 2011


Hi Keith,

I'm not able to reproduce this particular issue. I've attached a simple
script that I've been using to test the issue. The script loads a set of
keys into Riak and runs a MapReduce job. I've tried running against a single
node cluster and a three node cluster (devrel). Is there anything obvious I
am missing?

I'm running the script multiple times against different buckets:
for (( i=0; i<=10; i++ )); do ./test.sh $i; done

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
dan at basho.com


On Thu, Mar 31, 2011 at 9:20 AM, Dan Reverri <dan at basho.com> wrote:

> Hi Keith,
>
> The cache entry parameter name changed in 0.14 to "map_cache_size". Setting
> this parameter to 0 will disable the cache.
>
> Regarding the empty MapReduce results, I'll try to reproduce the issue
> locally and narrow down the cause.
>
> Thanks,
> Dan
>
> Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> dan at basho.com
>
>
>
> On Tue, Mar 29, 2011 at 6:16 PM, Keith Dreibelbis <kdreibel at gmail.com>wrote:
>
>> Followup to this (somewhat old) thread...
>>
>> I had resolved my problem by putting the vnode_cache_entries=0 thing in
>> app.config, doing what Grant said.  But sometime later it began failing
>> again.  I was getting misses of 25%-50% on records that should have been
>> found by map reduce but weren't.  At that point I tried Rohman's suggestion
>> of using a random seed, and that worked around the problem successfully.
>>  But this isn't a very satisfying fix.
>>
>> So the vnode_cache_entries=0 thing doesn't really fix it after all?  Is
>> there something else to put in the config that would make this work
>> properly, without the random seed hack?  BTW since the original thread I
>> have upgraded from 0.13 to 0.14, and the bug is still there.
>>
>>
>> Keith
>>
>>
>> On Thu, Mar 10, 2011 at 6:56 PM, Antonio Rohman Fernandez <
>> rohman at mahalostudio.com> wrote:
>>
>>> if you want to avoid caching ( without configuration ), you can put some
>>> random variable in your map or reduce or both... that does the trick for me
>>> as the query will be always different:
>>>
>>> $seed = randomStringHere;
>>>
>>> {"map":{"language":"javascript","source":"function(v,k,a) {
>>> seed='.$seed.'; x=Riak.mapValuesJson(v)[0]; return [v.values[0].data]; }"}
>>>
>>> Rohman
>>>
>>> On Thu, 10 Mar 2011 17:47:49 -0800, Keith Dreibelbis <kdreibel at gmail.com>
>>> wrote:
>>>
>>> Thanks for the prompt response, Grant.  I made the configuration change
>>> you suggested, and it fixed my problem.
>>>  Some followup questions:
>>>  - is it possible to configure this dynamically on a per-bucket basis,
>>> or just per-server like it is now?
>>> - is this fixed in a newer version?
>>>
>>> On Thu, Mar 10, 2011 at 2:56 PM, Grant Schofield <grant at basho.com>wrote:
>>>
>>>> There are currently some bugs in the mapreduce caching system. The best
>>>> thing to do would be to disable the feature, on 0.13 you can do this by
>>>> editing or adding the vnode_cache_entries to the riak_kv section of
>>>> your app.config. The entry would look like:
>>>> {vnode_cache_entries, 0},
>>>>
>>>>  Grant Schofield
>>>> Developer Advocate
>>>> Basho Technologies
>>>>
>>>>   On Mar 10, 2011, at 4:16 PM, Keith Dreibelbis wrote:
>>>>
>>>>  Hi riak-users,
>>>> I'm trying to do a map/reduce query from java on a 0.13 server, and get
>>>> inconsistent results.  What I'm doing should be pretty simple.  I'm hoping
>>>> someone will notice an obvious error in here, or have some insight:
>>>>  This is an automated test.  I'm doing a simple query where I'm trying
>>>> to get the keys for records with a certain field value.  In SQL it would
>>>> look like "SELECT id FROM table WHERE age = '32'".  In java I'm invoking it
>>>> like this:
>>>>    MapReduceResponse r = riak.mapReduceOverBucket(getBucket())
>>>>         .map(JavascriptFunction.anon(func), true)
>>>>              .submit();
>>>>  where riak is a RiakClient, getBucket() returns the name of the
>>>> bucket, and func is a string that looks like:
>>>>  function(value, keyData, arg) {
>>>>        var data = Riak.mapValuesJson(value)[0];
>>>>        if(data.age == "32")
>>>>          return [value.key];
>>>>       else
>>>>          return [];
>>>>    }
>>>>  No reduce phase.  All entries in the example bucket are json and have
>>>> an age field.  This initially works correctly, it gets back the matching
>>>> records as expected.  It also works in curl.  It's an automated test, so
>>>> each time I run this, it is using a different bucket.  After about a dozen
>>>> queries, this starts to fail.  It returns an empty result, when it should
>>>> have found records.  It fails in curl at the same time.
>>>>  I initially suspected this might have something to do with doing map
>>>> reduce too soon after writing, and the write not being available on all
>>>> nodes.  However, I changed the bucket schema entries for w,r,rw,dw from
>>>> "quorum" to "all", and this still happens (is there another bucket setting I
>>>> missed?). In addition, I only have 3 nodes (I'm using the dev123 example),
>>>> and am running curl long enough afterwards.
>>>>  Here's the strange part that makes me suspicious.  If I make
>>>> insignificant changes to the query, for example change the double quotes to
>>>> single quotes, add whitespace or extra parentheses, etc, then it suddenly
>>>> works again.  It will work on an existing bucket, and on subsequent tests,
>>>> but again only about a dozen times before it starts failing again. Same
>>>> behavior in curl.  This makes me suspect that the server is doing some
>>>> incorrect caching around this js function, based on the function string.
>>>>  Any explanation about what's going on?
>>>>  Keith
>>>>  _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>     --
>>>
>>> [image: line][image: logo] <http://mahalostudio.com>
>>>  *Antonio Rohman Fernandez*
>>> CEO, Founder & Lead Engineer
>>>
>>> rohman at mahalostudio.com
>>> *Projects*
>>> MaruBatsu.es <http://marubatsu.es>
>>>
>>> PupCloud.com <http://pupcloud.com>
>>> Wedding Album <http://wedding.mahalostudio.com>
>>> [image: line]
>>>
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110331/d8eaf535/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.sh
Type: application/x-sh
Size: 737 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110331/d8eaf535/attachment.sh>


More information about the riak-users mailing list