Map Reduce Requirements

Jeremiah Peschka jeremiah.peschka at gmail.com
Mon Aug 22 15:27:13 EDT 2011


Good questions! I've CC'd the list back in :)

Responses inline.
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Aug 22, 2011, at 12:18 PM, bill robertson wrote:

> That makes sense. 
> 
> Suppose I have a query called Q1. I would like to specify Q1 in Javascript. Assume that I can write an Erlang function called F that will translate the raw GPB bytes into the appropriate JSON for use by Q1. How would I hook F into the processing of Q1? 

You could write an initial map phase in Erlang that takes your Protocol Buffers and returns it to the next phase as JSON. As fas as I know, you can combine multiple languages in multiple phases. 

> 
> I guess that the Javascript function be passed the GPB bytes in the reduce phase at which point I could call my translation function and operate on the JSON, and possibly pass on a structure containing the JSON and the GPB to the next phase.
> 
> Does that make sense? Is it possible to invoke arbitrary Erlang functions within Javascript like this? If so, are there examples?
> 

I don't think Erlang can talk to JavaScript inside a single phase/function/pile of source code. I could be wrong, but it seems to me that marshaling data across the JavaScript/Erlang boundary would be hella expensive and cause a lot of problems and, as such, probably doesn't exist.

The best bet would either be an initial Erlang PB -> JSON phase or else using a JavaScript PB parser to return JavaScript objects.

I would benchmark this because with a PB -> JSON conversion you still need to convert JSON into an object, but that shouldn't take much at all.

Someone smarter than me may figure out how to get around this, but the biggest problem you will run into is that a PB -> JSON conversion will still have to convert the PB into a string representing JSON. Then that JSON string will have to be turned into a JavaScript object. You may pay a significant penalty for having to transform that data twice.

> Additionally, are secondary indexes meta-data?  i.e. If I built some secondary indices, these are stored in some form internal to Riak, and therefore available for query regardless of the type of data its associated with. Is this correct?

Secondary indexes are a separate physical structure, or so I gather. (Rusty could be full of lies.) They're stored separately from the initial data and not as metadata in the object headers. So, yes, you can store whatever you want in secondary indexes and query it however you want, provided there's an API that supports what you're doing.

If you're stretching for an analog, you can think of them as similar to secondary indexes in an RDBMS, but that's a stretch for many reasons ;)

> 
> Thanks,
> Bill Robertson
> 
> On Mon, Aug 22, 2011 at 2:57 PM, Jeremiah Peschka <jeremiah.peschka at gmail.com> wrote:
> You can MR across whatever kind of data you'd like. JSON is typically used because it's very easy to show people how to query JSON and the structure makes sense to many programmers.
> 
> To MR across anything else, you'll want a library that will translate your protocol buffers encoded data into objects that can be parsed in either JavaScript or Erlang. That is to say that you'll need a Serialization/Deserialization function to translate between data at rest (protobufs) to data that the MR program can understand.
> 
> Since there are protocol buffer libraries for many languages, this should be doable in either JavaScript or Erlang. I don't know of any examples, but it shouldn't be much more difficult than Riak.mapValuesJson - provided that you can find some easy magic to translate objects for you ;)
> ---
> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
> Microsoft SQL Server MVP
> 
> On Aug 22, 2011, at 11:51 AM, bill robertson wrote:
> 
> > In order to run a map reduce query v.s. Riak, does the data need to be stored in JSON? If this isn't a requirement, then how would I run a query against data stored in a google protocol buffer format? Is there an example of this somewhere?
> >
> > Thanks!
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 





More information about the riak-users mailing list