Really really basic questions

John Lynch john at rigelgroupllc.com
Mon Mar 1 22:54:44 EST 2010


I just asked that question recently, and got the below answer from Basho:

--------------------------
Hi John,

In the near future, we are planning to add a pre-commit "hook", specified at
a bucket level. This would provide the building blocks necessary to keep an
index up to date when an object is stored in Riak. Eventually, I expect to
see frameworks and other tools use the hook to allow easy indexing.

Until that is in place, the best approach for querying depends on the shape
of your data:

If you are searching for data through relations in a hierarchy, and you know
the starting point of that hierarchy, then you should add tagged links to
your objects and use linkwalking. If you need to search in a hierarchy, but
need more flexibility than links and tags can provide, then you can use
map/reduce functionality. By "starting point", I mean that you know the
exact object or objects under which you would like to query.

If your queries are not relational/hierarchical and you don't know the
starting point in advance, then your best approach would be to mimic the
hook feature described above, and build up your index by hand in a separate
Riak object. You could do this in your application when an object is stored
(which requires extra hops to Riak), or you can use a background process to
do this using list-keys, which means there will be some lag between when
your data is stored and when the index is updated. (Keep in mind that
list-keys can be an expensive operation, which is why it should be a
background process.)

Best,
Rusty



- Show quoted text -
On Fri, Feb 26, 2010 at 4:53 PM, John Lynch <john at rigelgroupllc.com> wrote:
- Hide quoted text -
I am preparing to give a talk on Riak next week to a local Ruby user group
here in San Diego, and wanted to get your thoughts on the future of Riak.
While in its current form it is awesome for loads of use cases, it falls
short in the whole querying department, at least as it relates to building
typical web applications. I get the map/reduce features, and the link
features, but are there any plans to build in indexed query capabilities, a
la MongoDB?  Mongo obviously has an easier time of this since it knows
exactly what format your data is in (BSON). Or, is this something you would
leave to third-parties to build as part of an ORM framework like Ripple, for
example, which would have a better idea of the shape of the data and could
create/maintain indexes accordingly...


Regards,

John Lynch, CTO
Rigel Group, LLC
john at rigelgroupllc.com



On Mon, Mar 1, 2010 at 5:22 PM, David King <dking at ketralnis.com> wrote:

> >> * I'm a little confused by the mapreduce query model. Do I specify the
> map/reduce pairs up-front before adding data (like an index), or every time
> I run a query (like a SQL SELECT)? Or in pairs (like doing a SELECT against
> columns that I know have an index)?
> > Map/Reduce runs at the time of a query, the map function runs on each
> node, and the final reduce phase runs on a single node. You dont have to
> query a whole bucket, you can specify a list of keys to start the search
> from.
>
> So for instance, I work for reddit. To ask for the top-N scoring Links
> right now, we'd have to map against every Link object in the store, every
> time?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100301/8d5d6532/attachment-0001.html>


More information about the riak-users mailing list