Riak n00b questions
sean at basho.com
Tue Mar 15 08:31:17 EDT 2011
Ok, that's fair (you're basically describing a CQRS-style architecture). But I don't see anything in that chain of processing that requires your individual Map and Reduce functions to be written in PHP or Python. The munging you describe would probably be best done outside Riak, either via a custom system, or via something like Hadoop's streaming interface. One thing that might be of interest to you is Disco, which is written primarily in Python with some bits of Erlang, and is an alternative to Hadoop.
Sean Cribbs <sean at basho.com>
Basho Technologies, Inc.
On Mar 15, 2011, at 4:28 AM, Ishwar wrote:
> The use-case that we're looking at is a bit more complicated than that. Briefly, this is what we want to do.
> 1. We get a whole bunch of data, say, blog posts from various sources which we index in Solr, and store in Riak in json format.
> 2. Once the data is in riak, we need to run a whole bunch of analysis on selected groups of records. The scripts to do this analysis are in PHP and Python. The idea is to run MapReduce on a batch of records, and update Solr with the results of the analysis. On Riak, the results of the analysis will be updated on a different bucket, with links to the original record.
> 3. At the serving end, it's going to be just key-value pair retrievals, or simple MapReduce.
> Pre-processing the data is not an option as we won't be running this analysis on all the records. It will be run only on a subset of data.
> Given these use-case, what do you suggest is the best way to use Riak?
> ----- Original Message -----
>> From:Sean Cribbs <sean at basho.com>
>> To:Ishwar <ishwarsridharan at yahoo.com>
>> Cc:"riak-users at lists.basho.com" <riak-users at lists.basho.com>
>> Sent:Monday, March 14, 2011 8:57 PM
>> Subject:Re: Riak n00b questions
>>>> It is not currently, but we are looking into the feasibility of
>> supporting other languages. However, I might say that if you're already
>> doing Python and PHP, it would be worth your while (and not difficult) to learn
>>> We already have a whole bunch of processing on the data written in Python
>> does not support the required functionality. For example, we do a bunch of NLP
>> analysis on the data.
>>> Given these, is it advisable if I expose these processes as webservices and
>> The other option of course, is to pre-process your data and just insert multiple
>> copies in different formats, which is a pretty common pattern. The tradeoff is
>> whether you want to pay the cost at query time or at write time. If you can pay
>> that cost up-front, reads will likely be key-value or very simple MapReduce and
>> thus very fast.
>> Sean Cribbs <sean at basho.com>
>> Developer Advocate
>> Basho Technologies, Inc.
More information about the riak-users