Interesting problem with riaksearch indexes

francisco treacy francisco.treacy at gmail.com
Tue Aug 23 06:53:00 EDT 2011


Hmmm, definitely not great... but thanks Ryan for the explanation

Francisco


2011/8/23 Ryan Zezeski <rzezeski at basho.com>

> Gordon,
>
> The reason for the nondeterministic behavior is two-fold.
>
> 1. For performance reasons Search only ever reads from 1 node (R=1)
>
> 2. As an attempt to balance load and reduce vnode contention this node is
> selected randomly
>
> This is why it works 50% of the time.  Because now, for each index entry, 2
> partitions have the data and 1 does not.  So depending on which one you hit
> you'll get the data or not.   Furthermore, this behavior will continue until
> you reindex because the index in Search has no form of anti-entropy such as
> read repair or merkle trees.
>
> In the future the easiest thing is to replace that lost node as quickly as
> possible.  While it's down the other nodes will keep track of the new index
> entries and will transfer them during data handoff when the node comes alive
> again.  By removing the node you've changed the ring and your only option is
> to reindex as you are already doing.  I realize that bringing that node up
> or replacing it may not have been an option but this is the only way to
> avoid this problem with Search as it stands today.
>
>   I realize this sucks and isn't in line with Riak's more fault tolerant
> behavior.  It does suck.  I hate the fact that I have to write this email
> basically telling you this part of Search is broken, IMO.  I want to see it
> addressed and I'm sure I'm not the only one.  Right now our internal ticket
> board is buzzing in anticipation for the new release.  After that there is a
> lot of love I want to give Search, this particular issue included.  I'd say
> it's only a matter of time.
>
>
> -Ryan
>
> On Fri, Aug 19, 2011 at 2:46 PM, Gordon Tillman <gtillman at mezeo.com>wrote:
>
>> Greetings all,
>>
>> After an extended datacenter power outage, a 3-node Riak cluster shut
>> down.  When the power was restored, two of the three nodes came back up.
>> Don't know what is going on with the third node.  But in the mean time, have
>> removed the dead node from the ring.  The two remaining nodes show a good
>> ringready status.
>>
>> The problem is that the search indexes appear to be in an inconsistent
>> state.  For example, I can issue the same solr query on one of the nodes and
>> 50% of the time it returns correct results.  The other times it returns an
>> empty result set.
>>
>> I'm in the process of re-indexing the bucket in question (a very
>> time-consuming affair).  But I wonder if anyone could shed some light on
>> this situation as to why it occurred in the first place and if there is
>> anything that can be done to keep this from happening again in the future.
>>
>> Many thanks,
>>
>> --gordon
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110823/19c8f3be/attachment.html>


More information about the riak-users mailing list