performance impact of many buckets?

Bryan Fink bryan at basho.com
Thu Oct 8 13:35:06 EDT 2009


On Thu, Oct 8, 2009 at 11:27 AM, Dan Reverri <reverri at gmail.com> wrote:
> "This way, you could sort tweet links in chronological
> order, and choose time-ranges of them before actually requesting the
> objects from Riak."
> Would you mind giving an example of the link ref that would support this? Is
> it possible to use the link walker to filter and sort a complex link tag?

Example links:

    [ ["tweet", "1", "2009-10-08 09:34:06"],
      ["tweet", "2", "2009-10-08 10:14:23"],
      ["tweet", "3", "2009-10-08 10:14:21"] ]

Three links, formatted [Bucket, Key, Tag].  Tag is a date format that
can be sorted lexicographically as a string.  So, it would be trivial
to sort the list above in time order (properly ["1", "3", "2"] in the
example).

The sorting and filtering would have to be done outside of a walk,
unfortunately.  But, it would be pretty simple to write a map/reduce
query that:

1. took a single user-key (or multiple user-keys, really) as input
2. ran a map phase on the user, which would:
   2a. extract all tweet-bucket links from the user
   2b. sort those links by their tag (==time order here)
   2c. grabbed the first N links of that list
   2d. returned a list of those links converted to bucket-key pairs:
      [ {Bucket, Key} || [Bucket, Key, _Tag] <- (first N links) ]
3. ran a second map phase over the tweets the first map phase
indicated, which would:
   3a. simply return the tweet (as a list [Tweet], to conform to the
map-function spec)

Such a map/reduce query would take you from User to RecentTweets in
one request.  But, such code must be written in Erlang at the moment.
Full m/r specification is not exposed through HTTP yet.

Alternatively, you may consider a date-only tag, like:

    [ ["tweet", "1", "2009-10-07"],
      ["tweet", "2", "2009-10-08"],
      ["tweet", "3", "2009-10-08"] ]

Which would allow you to select "only tweets from a given day", either
with mapred/2 link syntax:

    Client:mapred([{<<"user">>, <<"1">>}],
                  [{link, <<"tweet">>, <<"2009-10-08">>, false},
                  [{map, {modfun, jiak_object, mapreduce_identity},
ignored, true}]).

Or the URL syntax:

    http://host/jiak/user/1/tweet,2009-10-08,_

Either of which would give you tweets "2" and "3", but not tweet "1"
in this example.

-Bryan



More information about the riak-users mailing list