Geospatial advice?

Mark Rose markrose at
Tue May 1 15:19:20 EDT 2012

Trade offs. I dislike rewriting stuff that doesn't scale. I love the idea
of just throwing another box into a cluster and having it "just work"
without rebalancing issues, etc. I'm tired of dealing with shards, complex
replication setups, etc.

I will also be using heavily for logging, URL shortening, social share
registering, etc., and I'd rather stick to one datastore.

I agree it's a bit of a round peg in a ovoid hole when it comes to
geospatial + mapreduce queries, but I imagine it won't be long until
geospatial indexes arrive at Riak as well, and then I can switch to those.


On Tue, May 1, 2012 at 2:25 PM, Alexander Sicular <siculars at>wrote:

> Hey, I'm as up for a good and clever hack as anybody. But the question is
> just because you can, should you? Who will maintain your hack after your'e
> dead? I'm still maintaing crap I wrote years ago. Even though I'm paid,
> sometimes I would rather not have the headache. Why would you use a product
> that specifically does not support such hackery? Scaling postgres or mongo
> are known and solvable problems especially concerning bounded data sets,
> likes, say, all points on a globe. Now if you were storing checkins, that
> would be a different problem. One suitable for, say, Riak.
> On Tue, May 1, 2012 at 14:09, Mark Rose <markrose at> wrote:
>> Well, I'd be indexing items over the entire globe. I'd be be looking at
>> resolutions from an entire world view down to city block. I'm thinking of
>> using geohashes as an index to restrict the result set, then further
>> filtering and sorting by mapreducing the remaining items. So I only need
>> enough granularity to reduce the number of items to a reasonable amount. At
>> the world view level, I'd filter out most results using mapreduce, but the
>> local-level queries would be far more common so an index would be highly
>> advantageous. The geometry I'd want to query would be a window that
>> arbitrarily overlaps one or more geohash regions. Basically, think plotting
>> items in say, Google Maps.
>> Can you use a secondary index inside mapreduce? I haven't seen any
>> examples of it. I have only seen a secondary index being used to feed a
>> mapreduce. I am new to Riak.
>> I imagine my number of points would be at most 100 items per square km,
>> but typically less than 1 per square km. A 1 km resolution would be
>> sufficient. A 32 bit geohash would cover that fine. Vast regions of the
>> Earth would contain no points at all.
>> -Mark
>> On Tue, May 1, 2012 at 1:16 PM, Sean Cribbs <sean at> wrote:
>>> In contrast to Alexander's assessment, I'd say "it depends". I have
>>> built some geospatial indexes on top of Riak using a geohashing scheme
>>> based on the Hilbert space-filling curve. However, I had to choose specific
>>> levels of "zoom" and precompute them. Now that we have secondary indexes,
>>> you could perhaps bypass the precomputation step. In general, if you know
>>> the geometry of the space you want to query, you can fairly trivially
>>> compute the names of the geohashes you need to look up and then either
>>> fetch individual keys for those (if you precompute them), or use MapReduce
>>> to fetch a range of them. It's not automatic, for sure, but the greatest
>>> complexity will be in deciding which granularities of index to support.
>>> On Tue, May 1, 2012 at 12:44 PM, Alexander Sicular <siculars at>wrote:
>>>> My advice is to not use Riak. Check mongo or Postgres.
>>>> @siculars on twitter
>>>> Sent from my iRotaryPhone
>>>> On May 1, 2012, at 9:18, Mark Rose <markrose at> wrote:
>>>> > Hello everyone!
>>>> >
>>>> > I'm going to be implementing Riak as a storage engine for geographic
>>>> data. Research has lead me to using geohashing as a useful way to filter
>>>> out results outside of a region of interest. However, I've run into some
>>>> stumbling blocks and I'm looking for advice on the best way to proceed.
>>>> >
>>>> > Querying efficiently by geohash involves querying several regions
>>>> around a point. From what I can tell, Riak offers no way to query a
>>>> secondary index with multiple ranges. Having to query a several ranges,
>>>> merge them in the application layer, then pass them off to mapreduce seems
>>>> rather silly (and could mean passing GBs of data). Alternatively, I could
>>>> start straight with mapreduce, but key filtering seems to work only with
>>>> the primary key, which would force me into using the geohashed location as
>>>> the primary key (which would lead to collisions if two things existed at
>>>> the same point). I'd also like to avoid using the primary key as the
>>>> geohash as if the item moves I'd have to change all the references to it.
>>>> Lastly, I could do a less efficient mapreduce over a less precise geohash,
>>>> but this doesn't solve the issue of the equator (anything near the equator
>>>> would require mapreducing the entire dataset).
>>>> >
>>>> > Is there any way to query multiple ranges with a secondary index and
>>>> pass that off to mapreduce? Or should I just stick with the less efficient
>>>> mapreduce, and when near the equator, run two queries and later merge them?
>>>> Or am I going about this the wrong way?
>>>> >
>>>> > In any case, the final stage of my queries will involve mapreduce as
>>>> I'll need to further filter the items found in a region.
>>>> >
>>>> > Thank you,
>>>> > Mark
>>>> > _______________________________________________
>>>> > riak-users mailing list
>>>> > riak-users at
>>>> >
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at
>>> --
>>> Sean Cribbs <sean at>
>>> Software Engineer
>>> Basho Technologies, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list