Riak n00b questions

Ishwar ishwarsridharan at yahoo.com
Tue Mar 15 04:28:42 EDT 2011


The use-case that we're looking at is a bit more complicated than that. Briefly, this is what we want to do.

1. We get a whole bunch of data, say, blog posts from various sources which we index in Solr, and store in Riak in json format.

2. Once the data is in riak, we need to run a whole bunch of analysis on selected groups of records. The scripts to do this analysis are in PHP and Python. The idea is to run MapReduce on a batch of records, and update Solr with the results of the analysis. On Riak, the results of the analysis will be  updated on a different bucket, with links to the original record.

3. At the serving end, it's going to be just key-value pair retrievals, or simple MapReduce.

Pre-processing the data is not an option as we won't be running this analysis on all the records. It will be run only on a subset of data.

Given these use-case, what do you suggest is the best way to use Riak?


----- Original Message -----
> From:Sean Cribbs <sean at basho.com>
> To:Ishwar <ishwarsridharan at yahoo.com>
> Cc:"riak-users at lists.basho.com" <riak-users at lists.basho.com>
> Sent:Monday, March 14, 2011 8:57 PM
> Subject:Re: Riak n00b questions
> >> It is not currently, but we are looking into the feasibility of 
> supporting other languages.  However, I might say that if you're already 
> doing Python and PHP, it would be worth your while (and not difficult) to learn 
> JavaScript.
> > 
> > We already have a whole bunch of processing on the data written in Python 
> and PHP, and porting them to Javascript is (1) very tedious, and (2) Javascript 
> does not support the required functionality. For example, we do a bunch of NLP 
> analysis on the data.
> > 
> > Given these, is it advisable if I expose these processes as webservices and 
> call them from javascript/erlang?
> > 
> The other option of course, is to pre-process your data and just insert multiple 
> copies in different formats, which is a pretty common pattern.  The tradeoff is 
> whether you want to pay the cost at query time or at write time.  If you can pay 
> that cost up-front, reads will likely be key-value or very simple MapReduce and 
> thus very fast.
> Sean Cribbs <sean at basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/

More information about the riak-users mailing list