Geospatial advice?

Mark Rose markrose at markrose.ca
Tue May 1 15:02:28 EDT 2012


In general I've been shying away from datastores that aren't
highly-available. In a world of zero-downtime expectations, single box
solutions are out. Galera is nice on the SQL side but isn't scalable beyond
a few boxes. I am also looking for a tool that offers mapreduce, which
eliminates any SQL tool I know of. MongoDB might have sharding and
mapreduce, but suffers from a global insert write lock and doesn't
guarantee data persistence. The best comparison I've found of the different
datastores is at http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis .
Riak appeals to me for its high scalability, plus the ability to add new
nodes/CPU easily.

-Mark

On Tue, May 1, 2012 at 1:32 PM, Will Moss <wmoss at bu.mp> wrote:

> I remember someone once going on a rant about how there's no silver bullet
> database (If you have not read this<http://basho.com/blog/technical/2011/05/11/Lies-Damn-Lies-And-NoSQL/>,
> do so), so I'm, of course, going to agree Sean.
>
> If you're going to need to run this on more than one machine then going
> with something like Riak makes more sense. Postges has no build in sharding
> functionality, and it's not clear to me<http://groups.google.com/group/mongodb-user/browse_thread/thread/3934b7d49c07c3fd>that MongoDB's 2d indexes work in a sharded configuration.
>
> Will
>
>
> On Tue, May 1, 2012 at 10:16 AM, Sean Cribbs <sean at basho.com> wrote:
>
>> In contrast to Alexander's assessment, I'd say "it depends". I have built
>> some geospatial indexes on top of Riak using a geohashing scheme based on
>> the Hilbert space-filling curve. However, I had to choose specific levels
>> of "zoom" and precompute them. Now that we have secondary indexes, you
>> could perhaps bypass the precomputation step. In general, if you know the
>> geometry of the space you want to query, you can fairly trivially compute
>> the names of the geohashes you need to look up and then either fetch
>> individual keys for those (if you precompute them), or use MapReduce to
>> fetch a range of them. It's not automatic, for sure, but the greatest
>> complexity will be in deciding which granularities of index to support.
>>
>>
>> On Tue, May 1, 2012 at 12:44 PM, Alexander Sicular <siculars at gmail.com>wrote:
>>
>>> My advice is to not use Riak. Check mongo or Postgres.
>>>
>>>
>>> @siculars on twitter
>>> http://siculars.posterous.com
>>>
>>> Sent from my iRotaryPhone
>>>
>>> On May 1, 2012, at 9:18, Mark Rose <markrose at markrose.ca> wrote:
>>>
>>> > Hello everyone!
>>> >
>>> > I'm going to be implementing Riak as a storage engine for geographic
>>> data. Research has lead me to using geohashing as a useful way to filter
>>> out results outside of a region of interest. However, I've run into some
>>> stumbling blocks and I'm looking for advice on the best way to proceed.
>>> >
>>> > Querying efficiently by geohash involves querying several regions
>>> around a point. From what I can tell, Riak offers no way to query a
>>> secondary index with multiple ranges. Having to query a several ranges,
>>> merge them in the application layer, then pass them off to mapreduce seems
>>> rather silly (and could mean passing GBs of data). Alternatively, I could
>>> start straight with mapreduce, but key filtering seems to work only with
>>> the primary key, which would force me into using the geohashed location as
>>> the primary key (which would lead to collisions if two things existed at
>>> the same point). I'd also like to avoid using the primary key as the
>>> geohash as if the item moves I'd have to change all the references to it.
>>> Lastly, I could do a less efficient mapreduce over a less precise geohash,
>>> but this doesn't solve the issue of the equator (anything near the equator
>>> would require mapreducing the entire dataset).
>>> >
>>> > Is there any way to query multiple ranges with a secondary index and
>>> pass that off to mapreduce? Or should I just stick with the less efficient
>>> mapreduce, and when near the equator, run two queries and later merge them?
>>> Or am I going about this the wrong way?
>>> >
>>> > In any case, the final stage of my queries will involve mapreduce as
>>> I'll need to further filter the items found in a region.
>>> >
>>> > Thank you,
>>> > Mark
>>> > _______________________________________________
>>> > riak-users mailing list
>>> > riak-users at lists.basho.com
>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>>
>> --
>> Sean Cribbs <sean at basho.com>
>> Software Engineer
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120501/564f7400/attachment.html>


More information about the riak-users mailing list