map/reduce phases

Kevin Smith ksmith at basho.com
Mon Feb 8 13:21:13 EST 2010


On Feb 8, 2010, at 5:46 AM, francisco treacy wrote:

> Kevin,
> 
>> This is a difference between Riak's map/reduce and other document stores, namely Couch. Riak map functions expect to receive bucket/key pairs as inputs to the function. Riak examines the bucket/key pairs, determines where the data lives, and then invokes the function on the node hosting the data, That's how we get map parallelism and exploit data locality in one fell swoop.
> 
> Yes, I understand all this.
> 
>> You could restructure the job as a map function, invoking Riak.mapValuesJson, followed by a reduce function using your anonymous function. Reduce functions can accept arbitrary input -- as long as the input is a list (or array depending on the language used) -- and doesn't share map functions' input restrictions.
> 
> But it's still unclear to me why should I use a reduce function in that case.
> 
> I'll explain again what I want to achieve. The following JSON works
> exactly as I want:
> 
> {"inputs": "books",
>  "query":[
> {"map":{"language":"javascript","source":"function(v) { return
> [JSON.parse(v.values[0].data).object.chapters[4].title ]; }"}}
> ]}
> 
> It returns all the 5th chapter's titles of every book in the bucket. Perfect.
> 
> I thought, however, that invoking a phase with Riak.mapValuesJson
> would help me avoid the "JSON.parse(v.values[0].data)" part... that's
> it.  Is this the intended purpose of mapValuesJson or am I confused?

Two things:

1) Riak.mapValuesJson expects to receive an entire riak_object as a set of nested hashes encoded in JSON. Sending it anything else is an error and will not produce the expected results. You should see an error in your m/r output if a detectable error was generated, ie Spidermonkey throws an exception, or an empty result list in the case the function continues to completion but generates no results.

2) The machinery invoking map functions expects bucket and key pairs as inputs, nothing else. It is an error to run a map phase function on non bucket/key pair input. We take a fairly liberal approach to what reduce phases can produce so that you can do map-like things when your input is not bucket/key pairs. That is why I suggested you use a reduce phase in your code. Alternatively, the Riak.* functions are callable from within your own functions so you can include calls to them in your own map and reduce functions.

--Kevin

> 
> Thanks,
> 
> Francisco
> 
> 
> 
>> On Feb 7, 2010, at 5:17 PM, francisco treacy wrote:
>> 
>>> I am able to get a sensible result out of this query:
>>> 
>>> {"inputs": [["test", "jsondoc"]],
>>>  "query":[
>>> {"map":{"language":"javascript","source":"function(v){ return [v]; }"}}
>>> ]}
>>> 
>>> (just applying an identity function), however when I add a previous
>>> phase to map to my objects via Riak.mapValuesJson, like:
>>> 
>>> {"inputs": [["test", "jsondoc"]],
>>>  "query":[ {"map":{"language":"javascript", "name":"Riak.mapValuesJson"}},
>>> {"map":{"language":"javascript","source":"function(v){ return [v]; }"}}
>>> ]}
>>> 
>>> ...I always get back the value [].
>>> 
>>> If I *only* apply Riak.mapValuesJson, it works as advertised. Problem
>>> is with both... What could I be doing wrong?
>>> 
>>> Thanks!
>>> 
>>> Francisco
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 





More information about the riak-users mailing list