map function for link-walking

Nicolas Fouché nicolas at silentale.com
Sun Jul 11 09:19:17 EDT 2010


I was expecting some really tricky code, but it's just simple and
clean. Thanks a lot.
My links point to objects in the same bucket, so I'll store an array
of object keys and I'll add the bucket name directly in the map
function.

I did not find any doc about preloading javascript functions. Is it
the same as storing JS files in a bucket and load them thanks to the
"bucket" and "key" fields, as described in the "Map" paragraph of the
Fast Track ? https://wiki.basho.com/display/RIAK/Loading+Data+and+Running+MapReduce+Queries

-Nicolas

On Sun, Jul 11, 2010 at 12:20 AM, Bryan Fink <bryan at basho.com> wrote:
> On Sat, Jul 10, 2010 at 4:45 PM, Nicolas Fouché <nicolas at silentale.com> wrote:
>> In the "one-to-very-many link associations" thread , Sean Cribbs talks
>> about a map function which does link-walking from links stored in
>> object contents. http://bit.ly/cKguqQ
>>
>> "Another way to cope with large numbers of links is to
>> encapsulate them in the object itself, rather than in the headers.  This removes
>> the header-length/count limitation, but would require you to have a map function
>> that understands the internals of the object.  Also, you would need to deal with
>> the larger size of the object, which could potentially slow down your request."
>>
>> Is there any chance someone shares the code of a map function doing
>> this (custom-)link-walking ?
>
> Hi, Nicolas.  Any function you have that returns a list of bucket-key
> pairs, in the same format as the "inputs" list for the map/reduce
> query, will work.  For example, if you stored your object's links in a
> "mylinks" field in it's value, like so:
>
> $ curl -X PUT -H "content-type:application/json"
> http://localhost:8098/riak/example/foo --data @-
> {"mylinks":[["example","bar"],["example","baz"]],"myval":1}
> ^D
> $ curl -X PUT -H "content-type:application/json"
> http://localhost:8098/riak/example/bar --data @-
> {"mylinks":[["example","baz"]],"myval":2}
> ^D
> $ curl -X PUT -H "content-type:application/json"
> http://localhost:8098/riak/example/baz --data @-
> {"mylinks":[["example","foo"]],"myval":3}
> ^D
>
> Then you could use a very simple map function like:
>   function(v) {
>      return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
>   }
>
> And then the link-walking is simple:
>
> carboy:riak bryan$ curl -X POST -H "content-type:application/json"
> http://localhost:8098/mapred --data @-
> {"inputs":[["example","foo"]],"query":[{"map":{"language":"javascript","source":"function(v)
> { return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
> }"}},{"map":{"language":"javascript","source":"function(v) { return
> [JSON.parse(v.values[0].data).myval]; }"}}]}
> ^D
> [2,3]
>
> That query uses two map phases to start at the example/foo object I
> created above, and then follow the links it has to the example/bar and
> example/baz, and extracting the "myval" field from the values of those
> objects.
>
> I'd recommend adding a little defensive programming in to make sure
> that "mylinks" is defined, and that it's a list of the proper shape.
> It would also be a good idea to define these function in a file that
> Riak would preload, instead of specifying them dynamically in the
> query (for performance).  But, you could also take it in another
> direction: if you knew that all of your links were going to point to
> objects in a certain bucket, you could store just the keys in the
> object, and produce bucket-key pairs with a quick map function  (e.g.
> mykeys.map(function(k) { return ["otherbucket", k]; })
>
> Hope that helps.
>
> -Bryan
>




More information about the riak-users mailing list