Geospatial advice?

Will Moss wmoss at
Tue May 1 13:32:57 EDT 2012

I remember someone once going on a rant about how there's no silver bullet
database (If you have not read
do so), so I'm, of course, going to agree Sean.

If you're going to need to run this on more than one machine then going
with something like Riak makes more sense. Postges has no build in sharding
functionality, and it's not clear to
MongoDB's 2d indexes work in a sharded configuration.


On Tue, May 1, 2012 at 10:16 AM, Sean Cribbs <sean at> wrote:

> In contrast to Alexander's assessment, I'd say "it depends". I have built
> some geospatial indexes on top of Riak using a geohashing scheme based on
> the Hilbert space-filling curve. However, I had to choose specific levels
> of "zoom" and precompute them. Now that we have secondary indexes, you
> could perhaps bypass the precomputation step. In general, if you know the
> geometry of the space you want to query, you can fairly trivially compute
> the names of the geohashes you need to look up and then either fetch
> individual keys for those (if you precompute them), or use MapReduce to
> fetch a range of them. It's not automatic, for sure, but the greatest
> complexity will be in deciding which granularities of index to support.
> On Tue, May 1, 2012 at 12:44 PM, Alexander Sicular <siculars at>wrote:
>> My advice is to not use Riak. Check mongo or Postgres.
>> @siculars on twitter
>> Sent from my iRotaryPhone
>> On May 1, 2012, at 9:18, Mark Rose <markrose at> wrote:
>> > Hello everyone!
>> >
>> > I'm going to be implementing Riak as a storage engine for geographic
>> data. Research has lead me to using geohashing as a useful way to filter
>> out results outside of a region of interest. However, I've run into some
>> stumbling blocks and I'm looking for advice on the best way to proceed.
>> >
>> > Querying efficiently by geohash involves querying several regions
>> around a point. From what I can tell, Riak offers no way to query a
>> secondary index with multiple ranges. Having to query a several ranges,
>> merge them in the application layer, then pass them off to mapreduce seems
>> rather silly (and could mean passing GBs of data). Alternatively, I could
>> start straight with mapreduce, but key filtering seems to work only with
>> the primary key, which would force me into using the geohashed location as
>> the primary key (which would lead to collisions if two things existed at
>> the same point). I'd also like to avoid using the primary key as the
>> geohash as if the item moves I'd have to change all the references to it.
>> Lastly, I could do a less efficient mapreduce over a less precise geohash,
>> but this doesn't solve the issue of the equator (anything near the equator
>> would require mapreducing the entire dataset).
>> >
>> > Is there any way to query multiple ranges with a secondary index and
>> pass that off to mapreduce? Or should I just stick with the less efficient
>> mapreduce, and when near the equator, run two queries and later merge them?
>> Or am I going about this the wrong way?
>> >
>> > In any case, the final stage of my queries will involve mapreduce as
>> I'll need to further filter the items found in a region.
>> >
>> > Thank you,
>> > Mark
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at
>> >
>> _______________________________________________
>> riak-users mailing list
>> riak-users at
> --
> Sean Cribbs <sean at>
> Software Engineer
> Basho Technologies, Inc.
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list