Geospatial advice?

Alexander Sicular siculars at
Tue May 1 14:25:07 EDT 2012

Hey, I'm as up for a good and clever hack as anybody. But the question is
just because you can, should you? Who will maintain your hack after your'e
dead? I'm still maintaing crap I wrote years ago. Even though I'm paid,
sometimes I would rather not have the headache. Why would you use a product
that specifically does not support such hackery? Scaling postgres or mongo
are known and solvable problems especially concerning bounded data sets,
likes, say, all points on a globe. Now if you were storing checkins, that
would be a different problem. One suitable for, say, Riak.

On Tue, May 1, 2012 at 14:09, Mark Rose <markrose at> wrote:

> Well, I'd be indexing items over the entire globe. I'd be be looking at
> resolutions from an entire world view down to city block. I'm thinking of
> using geohashes as an index to restrict the result set, then further
> filtering and sorting by mapreducing the remaining items. So I only need
> enough granularity to reduce the number of items to a reasonable amount. At
> the world view level, I'd filter out most results using mapreduce, but the
> local-level queries would be far more common so an index would be highly
> advantageous. The geometry I'd want to query would be a window that
> arbitrarily overlaps one or more geohash regions. Basically, think plotting
> items in say, Google Maps.
> Can you use a secondary index inside mapreduce? I haven't seen any
> examples of it. I have only seen a secondary index being used to feed a
> mapreduce. I am new to Riak.
> I imagine my number of points would be at most 100 items per square km,
> but typically less than 1 per square km. A 1 km resolution would be
> sufficient. A 32 bit geohash would cover that fine. Vast regions of the
> Earth would contain no points at all.
> -Mark
> On Tue, May 1, 2012 at 1:16 PM, Sean Cribbs <sean at> wrote:
>> In contrast to Alexander's assessment, I'd say "it depends". I have built
>> some geospatial indexes on top of Riak using a geohashing scheme based on
>> the Hilbert space-filling curve. However, I had to choose specific levels
>> of "zoom" and precompute them. Now that we have secondary indexes, you
>> could perhaps bypass the precomputation step. In general, if you know the
>> geometry of the space you want to query, you can fairly trivially compute
>> the names of the geohashes you need to look up and then either fetch
>> individual keys for those (if you precompute them), or use MapReduce to
>> fetch a range of them. It's not automatic, for sure, but the greatest
>> complexity will be in deciding which granularities of index to support.
>> On Tue, May 1, 2012 at 12:44 PM, Alexander Sicular <siculars at>wrote:
>>> My advice is to not use Riak. Check mongo or Postgres.
>>> @siculars on twitter
>>> Sent from my iRotaryPhone
>>> On May 1, 2012, at 9:18, Mark Rose <markrose at> wrote:
>>> > Hello everyone!
>>> >
>>> > I'm going to be implementing Riak as a storage engine for geographic
>>> data. Research has lead me to using geohashing as a useful way to filter
>>> out results outside of a region of interest. However, I've run into some
>>> stumbling blocks and I'm looking for advice on the best way to proceed.
>>> >
>>> > Querying efficiently by geohash involves querying several regions
>>> around a point. From what I can tell, Riak offers no way to query a
>>> secondary index with multiple ranges. Having to query a several ranges,
>>> merge them in the application layer, then pass them off to mapreduce seems
>>> rather silly (and could mean passing GBs of data). Alternatively, I could
>>> start straight with mapreduce, but key filtering seems to work only with
>>> the primary key, which would force me into using the geohashed location as
>>> the primary key (which would lead to collisions if two things existed at
>>> the same point). I'd also like to avoid using the primary key as the
>>> geohash as if the item moves I'd have to change all the references to it.
>>> Lastly, I could do a less efficient mapreduce over a less precise geohash,
>>> but this doesn't solve the issue of the equator (anything near the equator
>>> would require mapreducing the entire dataset).
>>> >
>>> > Is there any way to query multiple ranges with a secondary index and
>>> pass that off to mapreduce? Or should I just stick with the less efficient
>>> mapreduce, and when near the equator, run two queries and later merge them?
>>> Or am I going about this the wrong way?
>>> >
>>> > In any case, the final stage of my queries will involve mapreduce as
>>> I'll need to further filter the items found in a region.
>>> >
>>> > Thank you,
>>> > Mark
>>> > _______________________________________________
>>> > riak-users mailing list
>>> > riak-users at
>>> >
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at
>> --
>> Sean Cribbs <sean at>
>> Software Engineer
>> Basho Technologies, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list