Secondary Indices and Storing Binary Data

Rusty Klophaus rusty at
Tue May 3 15:48:39 EDT 2011

Hi Runar,

Our current prototype of Secondary Indices works like this:

- You tell the system how to index an object by "tagging" it with
field/value pairs. The tags are passed to Riak via object metadata,
currently sent via HTTP headers. I believe this answers your main question.

- Those who would rather have implicit indexing can set up a pre-commit hook
to extract field/value pairs and tag the object on the fly at write time.
This would also be the recommended way to handle corner cases like
field/value pairs that can't fit in an HTTP header.

- We're initially targeting a SQL-ish type language for querying, with
support for exact match and range queries.

We're still in the process of designing how to expose the functionality, so
other parts of the interface are in flux. As one of my favorite quotes goes:
"API design is like sex: make one mistake and support it for the rest of
your life." (@joshbloch)


On Tue, May 3, 2011 at 9:28 AM, Eric Moritz <eric at>wrote:

> If what Alexander is saying is true, what you would need to do is
> create your own commit hook that would look for a custom HTTP header:
> X-RIAK-INDEX: {"field1": "value1", "field2": "value2"}
> That hook would use that header to index on instead of the Riak
> Object's value.  There are limitations to this approach such as a size
> limit on HTTP headers and the header value would have to be on one
> line.  I think that given that secondary indexes are usually small
> simple values, the size limitation probably won't come up.  In
> addition, producing JSON on a single line is easy because new lines in
> string values are automatically escaped.
> The worse (or best) part about this is you would have to write the
> commit hook in Erlang if the current search hook is any indicator of
> what the secondary indexes hook will look like.
> Eric Moritz.
> On Tue, May 3, 2011 at 11:18 AM, Alexander Sicular <siculars at>
> wrote:
> > I think all that is yet to be determined, from a public standpoint (I
> > don't know any more than anyone not working at Basho). If secondary
> > indexes are implemented (interface wise) in a similar fashion as Riak
> > Search then I would imagine that the index would be enabled as a hook
> > at write time. Either a hook at write time (which I would imagine will
> > be the method employed) or some other mechanism, Riak would need to be
> > able to understand the value you are writing to disk to be able to
> > index it properly. Meaning that whichever way you are encoding your
> > object, Riak would need to be able to decode it in the two languages
> > that it speaks - javascript or erlang.
> >
> > Cheers,
> > Alexander
> >
> > On Tue, May 3, 2011 at 08:55, Runar Jordahl <runar.jordahl at>
> wrote:
> >> I am working on a solution where an object graph (for example a full
> >> insurace contract) is serialized to binary data and stored as a single
> >> Riak object. Upon saving the binary data to Riak, I will extract the
> >> fields needed for indexing and (hopefully) use the upcoming “secondary
> >> indices” (
> >> ) to index the object based on the extracted data.
> >>
> >> How will this work with the upcoming secondary indices feature? Will I
> >> need to store all data (index data and binary data) as a single JSON
> >> object, using base-encoding for the binary part? Or, will I be able to
> >> use custom headers for the indices, storing only raw binary data as
> >> the main content?
> >>
> >> Kind regards
> >> Runar
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users at
> >>
> >>
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at
> >
> >
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list