map function for link-walking

Bryan Fink bryan at basho.com
Sat Jul 10 18:20:36 EDT 2010


On Sat, Jul 10, 2010 at 4:45 PM, Nicolas Fouché <nicolas at silentale.com> wrote:
> In the "one-to-very-many link associations" thread , Sean Cribbs talks
> about a map function which does link-walking from links stored in
> object contents. http://bit.ly/cKguqQ
>
> "Another way to cope with large numbers of links is to
> encapsulate them in the object itself, rather than in the headers.  This removes
> the header-length/count limitation, but would require you to have a map function
> that understands the internals of the object.  Also, you would need to deal with
> the larger size of the object, which could potentially slow down your request."
>
> Is there any chance someone shares the code of a map function doing
> this (custom-)link-walking ?

Hi, Nicolas.  Any function you have that returns a list of bucket-key
pairs, in the same format as the "inputs" list for the map/reduce
query, will work.  For example, if you stored your object's links in a
"mylinks" field in it's value, like so:

$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/foo --data @-
{"mylinks":[["example","bar"],["example","baz"]],"myval":1}
^D
$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/bar --data @-
{"mylinks":[["example","baz"]],"myval":2}
^D
$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/baz --data @-
{"mylinks":[["example","foo"]],"myval":3}
^D

Then you could use a very simple map function like:
   function(v) {
      return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
   }

And then the link-walking is simple:

carboy:riak bryan$ curl -X POST -H "content-type:application/json"
http://localhost:8098/mapred --data @-
{"inputs":[["example","foo"]],"query":[{"map":{"language":"javascript","source":"function(v)
{ return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
}"}},{"map":{"language":"javascript","source":"function(v) { return
[JSON.parse(v.values[0].data).myval]; }"}}]}
^D
[2,3]

That query uses two map phases to start at the example/foo object I
created above, and then follow the links it has to the example/bar and
example/baz, and extracting the "myval" field from the values of those
objects.

I'd recommend adding a little defensive programming in to make sure
that "mylinks" is defined, and that it's a list of the proper shape.
It would also be a good idea to define these function in a file that
Riak would preload, instead of specifying them dynamically in the
query (for performance).  But, you could also take it in another
direction: if you knew that all of your links were going to point to
objects in a certain bucket, you could store just the keys in the
object, and produce bucket-key pairs with a quick map function  (e.g.
mykeys.map(function(k) { return ["otherbucket", k]; })

Hope that helps.

-Bryan




More information about the riak-users mailing list