riak-ql

Kev Burns kevburnsjr at gmail.com
Tue Jul 26 23:38:39 EDT 2011


Well... you would think that doing a map/reduce across a smaller bucket
would take less time but this isn't as true as you might think.
If I remember correctly, Riak doesn't store bucket/key values in memory, it
just stores the hash.
So if you use {"inputs": "Messages_Rohman", ... }, it hast to test every key
in memory to see if any of them is in the bucket you specified.

See here
http://wiki.basho.com/MapReduce.html#Inputs

You may also pass just the name of a bucket ({"inputs":"mybucket",...}),
> which is equivalent to passing all of the keys in that bucket as inputs
> (i.e. “a map/reduce across the whole bucket”). You should be aware that this
> triggers the somewhat expensive “list keys” operation, so you should use it
> sparingly.
>

Here "somewhat expensive" is an understatement.
If you have more than 10,000 keys, list keys could take several minutes.
Even if your bucket only has 10 keys.
A better solution right now is to use key filters or a search input to the
map/reduce.

A bucket input may also be combined with Key Filters to limit the number of
> objects processed by the first query phase.
> If you’re using Riak Search, the list of inputs can also reference a search
> query to be used as inputs.
>

Hopefully Secondary Indexes comes with a new map/reduce input type that does
something similar.

- Kev
c: +001 (650) 521-7791


On Tue, Jul 26, 2011 at 8:21 PM, Antonio Rohman Fernandez <
rohman at mahalostudio.com> wrote:

> **
>
> "The problem I see with riak-ql and Antonio's thing is that they're
> invariably going to be slow.
> Javascript Map/Reduce over an entire bucket is just not suitable for inline
> requests."
>
> Yes, of course, i also think that the MapReduce phases should be done in
> the background with some cron jobs or other methods... you don't want to
> execute this kind of queries on your UI web app, but at least the
> development is done in case you need to do so ( of course, using your head
> on how you distribute data on the buckets )... is for example with
> facebook's "News feed"... if we call that MapReduce query everytime the user
> click on the "Home" tab, it will be terrible expensive, so a process in the
> background generating the "News feed" for you and updating it every 5min (
> for example ) will be more ideal... but still, is good to have MapReduce
> options OnDemand in case you want to grab some special data, etc... even the
> transaction will be a bit costly.
>
> Also it depends on how you store your data... if you just have a "Messages"
> bucket to store everybody's messages, will be very hard to query... instead,
> if you atomize it like "Messages_Rohman", "Messages_OtherUser", etc... you
> will have less data on each bucket and queries could be faster and the
> MapReduce could be an Option for OnDemand data.
>
> Rohman
>
> On Tue, 26 Jul 2011 20:12:08 -0700, Kev Burns wrote:
>
> sorry i meant to post this to the list
>
> - Kev
> c: +001 (650) 521-7791
>
>
> On Tue, Jul 26, 2011 at 8:11 PM, Kev Burns <kevburnsjr at gmail.com> wrote:
>
>> Francisco - How's performance on riak-ql?
>>
>> The problem I see with riak-ql and Antonio's thing is that they're
>> invariably going to be slow.
>> Javascript Map/Reduce over an entire bucket is just not suitable for
>> inline requests.
>>
>> Take PodCrazy
>> http://podcrazy.net/
>>
>> It's backed entirely by RiakSearch and memcached.
>> That episode listing on the homepage is a map/reduce that calculates
>> popularity based on votes over time.
>> But right now this simple javascript map/reduce over less than a thousand
>> items takes about 2 seconds to run.
>> It totally makes sense as a map/reduce because it's calculating popularity
>> based on several decaying attributes.
>> But it has to happen in a background process.
>>
>> The site is powered by this port of Ripple to PHP
>> http://ripple-php.hackyhack.net/test/?test=document
>>
>> Right now ripple-php has remained pretty basic and for good reason.
>> I've put off creating something more full-featured until secondary indexes
>> makes it into master.
>> The shape of the native secondary index mechanism will heavily influence
>> the design of any Riak ODM.
>>
>> Lastly, in my experience Riak Search is not very memory efficient as a
>> non-fulltext index mechanism.
>> Also sort of useless without support for ranges on anything other than
>> keys.
>> You wind up selecting limit 0,999 and doing the slice yourself.
>>
>> I suppose I see the value of a tool like riak-ql for reporting.
>> And I imagine this sort of tool will continue to be useful to add the sort
>> of features that Secondary Indexes will not support.
>> And I also imagine a tool like this would be faster if implemented in
>> Erlang.
>>
>>
>> - Kev
>> c: +001 (650) 521-7791
>>
>>
>>  On Tue, Jul 26, 2011 at 7:35 PM, Antonio Rohman Fernandez <
>> rohman at mahalostudio.com> wrote:
>>
>>>  thanks for porting it to Google Docs, even seems the text got a little
>>> compressed in there, too cluttered.
>>> hope it can help somebody.
>>>
>>> Rohman
>>>
>>> On Tue, 26 Jul 2011 19:21:36 -0700, Kev Burns wrote:
>>>
>>> Here's a virus-free version of Antonio's slide deck (Google Docs)
>>> https://docs.google.com/present/view?id=dhpxng6q_51gdj6r9wn
>>>
>>> - Kev
>>> c: +001 (650) 521-7791
>>>
>>>
>>> On Tue, Jul 26, 2011 at 6:23 PM, Antonio Rohman Fernandez <
>>> rohman at mahalostudio.com> wrote:
>>>
>>>>  for PHP you can take a look at this slides i made, is about "phpCloud
>>>> Framework" a new PHP5 MVC framework i'm building with Riak integration in
>>>> place : ) is based on CakePHP that borrows heavily on Ruby on Rails.
>>>> You can download the slides on this address ( seems the file is too big
>>>> for the distribution list as my last mail couldn't be sent ):
>>>>
>>>> http://mahalostudio.com/Riak_phpCloud.pptx
>>>>
>>>> Rohman
>>>>
>>>> --
>>>> [image: line]  [image: logo] <http://mahalostudio.com/>   *Antonio
>>>> Rohman Fernandez*
>>>> CEO, Founder & Lead Engineer
>>>> rohman at mahalostudio.com   *Projects*
>>>> MaruBatsu.es <http://marubatsu.es/>
>>>> PupCloud.com <http://pupcloud.com/>
>>>> Wedding Album <http://wedding.mahalostudio.com/>  [image: line]
>>>>
>>>> On Tue, 26 Jul 2011 20:00:27 -0400, Jonathan Langevin wrote:
>>>>
>>>>  Looks interesting, but doesn't appear very intuitive (at least, to a
>>>> PHP dev)**
>>>>
>>>>    <http://www.loomlearning.com/>   *Jonathan Langevin*
>>>> Systems Administrator  *Loom Inc.*
>>>> Wilmington, NC: (910) 241-0433 - jlangevin at loomlearning.com -
>>>> www.loomlearning.com - Skype: intel352
>>>>
>>>> **
>>>>
>>>>
>>>>  On Mon, Jul 25, 2011 at 9:40 AM, francisco treacy <
>>>> francisco.treacy at gmail.com> wrote:
>>>>
>>>>> It's awesome for ad-hoc querying, at least. An example can better
>>>>> explain.
>>>>>
>>>>> Consider this:
>>>>>
>>>>> db.add('users').map('query', '.address .street where
>>>>> .weight:expr(x !.expired').run()
>>>>>
>>>>>
>>>>> as opposed to:
>>>>>
>>>>> db.add('users').map(function(v) {
>>>>>  v = Riak.mapValuesJson(v)[0];
>>>>>  var result = [];
>>>>>  if ((v.weight < 180 || v.exempt) && v.acl && v.acl.state === '1101'
>>>>> && !v.expired) {
>>>>>    if (v.address) {
>>>>>      result.push(v.address.street);
>>>>>    }
>>>>>  }
>>>>>  return result;
>>>>> }).run()
>>>>>
>>>>>
>>>>> riak-ql is basically adding some query sugar (where, &&) on top of
>>>>> JSONSelect... which you can try it out here:
>>>>> http://jsonselect.org/#tryit
>>>>>
>>>>>
>>>>> 2011/7/25 Mark Phillips <mark at basho.com>:
>>>>>  > Hey Francisco,
>>>>> >
>>>>> > I for one would be interested in learning some more specifics on how
>>>>> > you're using it. I suspect others might be, too...
>>>>> >
>>>>> > Mark
>>>>> >
>>>>> > On Sat, Jul 23, 2011 at 4:40 PM, francisco treacy
>>>>> > <francisco.treacy at gmail.com> wrote:
>>>>> >> Hey all,
>>>>> >>
>>>>> >> Just wondering... is anyone using, or have tried out riak-ql?
>>>>> >> https://github.com/frank06/riak-ql
>>>>> >>
>>>>> >> Not because I developed it -- but I'm regularly making use of it and
>>>>> I
>>>>> >> think it kicks ass. Especially in the repl in combo with riak-js.
>>>>> >>
>>>>> >> What do you guys think?
>>>>> >>
>>>>> >> Francisco
>>>>> >>
>>>>> >> ps: really curious/excited about the upcoming Secondary Indices
>>>>> functionality
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> riak-users mailing list
>>>>> >> riak-users at lists.basho.com
>>>>> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>> >>
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>   --
>>>  [image: line]  [image: logo] <http://mahalostudio.com>   *Antonio
>>> Rohman Fernandez*
>>> CEO, Founder & Lead Engineer
>>> rohman at mahalostudio.com   *Projects*
>>> MaruBatsu.es <http://marubatsu.es>
>>> PupCloud.com <http://pupcloud.com>
>>> Wedding Album <http://wedding.mahalostudio.com>  [image: line]
>>>
>>    --
> [image: line]  [image: logo] <http://mahalostudio.com>   *Antonio Rohman
> Fernandez*
> CEO, Founder & Lead Engineer
> rohman at mahalostudio.com   *Projects*
> MaruBatsu.es <http://marubatsu.es>
> PupCloud.com <http://pupcloud.com>
> Wedding Album <http://wedding.mahalostudio.com>  [image: line]
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110726/690982f2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: blocked.gif
Type: image/gif
Size: 118 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110726/690982f2/attachment.gif>


More information about the riak-users mailing list