Ryan Zezeski rzezeski at basho.com
Wed Jan 25 11:50:55 EST 2012

I think this is a reasonable question to ask, re the 1/N question during
query time.  This is currently an implementation detail of secondary
indices in Riak.  The document vs. term-based index partitioning seems to
be a common debate.  It almost feels like the emacs/vim flamewar of the
distributed index field.  A couple issues to take with the document-based
approach is that since you must query 1/N of your nodes you a) reduce the
throughput of concurrent queries across the cluster (because queries must
contend for resources) and b) as N grows you increase your chance of
hitting the tcp incast problem [1] which can cause major problems.  A
couple of issues with term-based are a)  your data and it's index are no
longer co-located which can be nice b) indexing a single document will
often cause a write to all nodes.

To fight 1/N in document-based perhaps you could do a chained query
approach where the query is sent around the ring to avoid the incast
problem, but at the cost of higher latencies (although you could
parallelize it to  some set number X and skip nodes in the ring at the end
having X nodes converging to the coordinator).  To fight term-based's write
to all nodes on index problem perhaps you could hash on something less
granular like index/field or even just index but replicate to more nodes
and play some tricks at the index level for better concurrent access.  If
this all seems hand wavy that's because it is.

I also think it's perfectly reasonable to keep your index completely
separate from the data itself.  Look no further than a library for a real
life example of this, the card catalog.  Yes, you have potential
consistency problems, you end up having special nodes in the system (e.g.
special nodes dedicated to indexing), multiple hops to get from index
lookup to object retrieval, but at the end of the day these are the
tradeoffs you must weigh in light of your system.

These are questions I continue to ponder myself and it's my belief that
Riak will continue to get stronger at querying your data.  However,
sometimes you may also need to use another solution to store your index
that works alongside Riak and I also think that is a perfectly reasonable
choice as long as you understand the system you're building and the
tradeoffs you are making.  I think in the future it would even be good for
Riak to have integration points with other solutions to make stuff like
this easier.  Please voice your opinions on the mailing list if you have
them.  I would love to hear them.

[1]: http://www.snookles.com/slf-blog/2012/01/05/tcp-incast-what-is-it/


On Wed, Jan 25, 2012 at 10:12 AM, Roberto Calero <roberto_calero at hotmail.com
> wrote:

> ------------------------------
> From: jeremiah.peschka at gmail.com
> Date: Wed, 25 Jan 2012 06:48:45 -0800
> Subject: Re: Should Riak have used dedicated nodes for secondary indices?
> To: runar.jordahl at gmail.com
> CC: riak-users at lists.basho.com
> Good news! Riak doesn't use sharding.
> Data locality is critical in a distributed system. When you create an
> index, your structure looks something like:
> indexed_value:record_id
> Reading from an index requires locating indexed_value, finding all
> matching values, and then retrieving all matching record_ids. By keeping
> index data on the same node as the source data, Riak avoids having to
> remote the query to retrieve object data. This is a Good Thing. The network
> is slow and unreliable. Just ask an Australian.
> Riak's approach is intended to provide a uniform system where you can
> treat any node equally. The idea that there should be an unsharded index
> node is a bit ludicrous. Let's say you have 1TB of raw data. Your indexes
> are pretty light and are only about 20% of your data size. This means that
> you need 200GB of good storage (not some cheap $150 SATA HDD you found on
> NewEgg). 200GB of RAID 10 SAS storage isn't that pricey to put in a single
> unsharded machine. Over time as your data grows and your indexing changes,
> you may have 10TB and your index size is ~40% of your data. Your unsharded
> index server now has to have 4TB of fast, reliable storage. And, since this
> is an unsharded system, you'll want multiple replicas of your unsharded
> index server to make sure that a hardware hiccup doesn't take down your
> ability to perform fast lookups. Besides - a single indexing server becomes
> a single bottleneck and a single point of failure in your system.
> Most people using Lucene as their indexing store are sharding Lucene. From
> an anecdotal standpoint, about 70% of the people I've talked to using
> Lucene are getting to the point of sharding their replicated Lucene indexes.
> I'm not saying that either approach is good or bad; just remember that
> every solution has drawbacks.
> ---
> Jeremiah Peschka, SQL Server MVP
> Managing Director, Brent Ozar PLF, LLC
> On Wed, Jan 25, 2012 at 5:15 AM, Runar Jordahl <runar.jordahl at gmail.com>wrote:
> Siddharth Anand, says that secondary indices (for a key-value store)
> best is placed on a separate node, avoiding the need to look up 1 / N
> nodes during a query:
> "Systems that shard data based on a primary key will do well when
> routed by that key. When routed by a secondary key, the system will
> need to “spray” a query across all shards. If one of the shards is
> experiencing high latency, the system will return either no results or
> incomplete (i.e. inconsistent) results. For this reason, it would make
> sense to store the secondary index on an unsharded (but replicated)
> system."
> http://highscalability.com/blog/2012/1/24/the-state-of-nosql-in-2012.html
> If I understand Riak correctly, it takes the opposite approach,
> storing secondary indices together with the data.
> To me at appears like Riak’s approach gives a more uniform system,
> with all nodes having the same responsibilities. Does anyone else have
> any thoughts on this?
> Kind regards
> Runar Jordahl
> blog.epigent.com
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________ riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120125/1c673a29/attachment-0001.html>

More information about the riak-users mailing list