Dangling keys/objects after a batch of sequential inserts (for going on 3 days)

Siraaj Khandkar siraaj at khandkar.net
Sun Jul 21 19:15:55 EDT 2013


On 07/21/2013 04:54 PM, Russell Brown wrote:
>
> On 21 Jul 2013, at 14:20, Siraaj Khandkar <siraaj at khandkar.net> wrote:
>
>> On 07/21/2013 07:24 AM, Russell Brown wrote:> Hi,
>>>
>>> On 21 Jul 2013, at 02:09, Siraaj Khandkar <siraaj at khandkar.net> wrote:
>>>
>>>> I (sequentially) made 146204 inserts of unique objects to a single
>>>> bucket.  Several secondary indices (most with unique values) were set
>>>> for each object, one of which was "bucket" = BucketName (to use 2i
>>>> for listing all keys).
>>>
>>> There is a special $bucket index for this already, please see the docs
>>> here http://docs.basho.com/riak/latest/dev/using/2i/
>>>
>>
>> Yeah... I stumbled on that piece of info in another doc about two days
>> ago - made me feel both stupid and validated :)
>>
>> However, it doesn't seem to work for me - I always get: {ok,{keys,[]}}
>
> Curious. How do you make the 2i query to the $bucket index?

Just as bellow, but with "$bucket" instead of "bucket":

Index = {binary_index, "$bucket"},
riakc_pb_socket:get_index(PID, Bucket, Index, Bucket).



>>>>
>>>> 6 of the objects appear to have been lost - they're consistently not
>>>> found by GETs (by key) and are not found by 2i queries to the indices
>>>> with unique values.
>>>
>>> Oh. Erm. Have you deleted some keys? 2i is essentially an r=1 query.
>>>
>>
>> Sort-of. This was a second instance of this batch insertion (a slightly
>> extended set of keys), the first one was deleted ~6 hours prior to
>> executing the second one.
>>
>> At the end of the deletion there _were_ some tombstones left. Frankly I
>> do not remember with certainty if there are overlaps between tombstones
>> from previous delete and the keys in question. In retrospect - it was
>> big failure on my part not to take note of those.
>>
>> After the second instance of the set insertion - there were _no_
>> more deletions.
>>
>> So, in summary:
>>
>> 1) Inserted the set
>> 2) Deleted the set
>> 3) 6 hours passed
>> 4) Inserted the set
>> 5) Observed the problem
>
> What is your delete_mode setting, please (http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/)?
>

It is not configured explicitly, so I am assuming the default 3 second 
delay.


 >
> Did the second insert do a fetch to get a tombstone vclock before trying to overwrite the key, or a PUT with an empty vclock?
 >

PUT with an empty vclock.


>>>>
>>>> Now, I understand there may be a replication lag, but this state has
>>>> remained for over 3 days now.
>>>>
>>>> "What is fucked, and why?" :)
>>>
>>> Good question.
>>>
>>
>> I was hoping this list would appreciate the reference :)
>>
>>
>>> Could you provide some more details to help me figure it out: How many
>>> nodes are you running?
>>
>> 5
>>
>>
>>> Can you provide an example of the 2i queries you're running?
>>
>> This is how I am testing it:
>>
>>     Compare = fun(PID, Bucket) ->
>>         B = Bucket,
>>         L1 = riakc_pb_socket:get_index(PID, B, {binary_index, "bucket"}, B),
>>         L2 = riakc_pb_socket:get_index(PID, B, {binary_index, "bucket"}, B),
>>         io:format("L1: ~b, L2: ~b~n",[length(L1), length(L2)]),
>>         Diff_L1_L2 = L1 -- L2,
>>         Diff_L2_L1 = L2 -- L1,
>>         io:format("=== L1 -- L2 ===~n~p~n~n", [Diff_L1_L2]),
>>         io:format("=== L2 -- L1 ===~n~p~n~n", [Diff_L2_L1]),
>>         Fetch = fun(Key) ->
>>             case riakc_pb_socket:get(PID, B, Key) of
>>                 {ok, _}    -> io:format("FOUND: ~p~n", [Key]);
>>                 {error, _} -> io:format("NOT FOUND: ~p~n", [Key])
>>             end
>>         end,
>>         io:format("=== L1 -- L2 ===~n"),
>>         lists:foreach(Fetch, Diff_L1_L2),
>>         io:format("=== L2 -- L1 ===~n"),
>>         lists:foreach(Fetch, Diff_L2_L1)
>>     end.
>>
>> Which results in differences _sometimes_, but _always_ fails on get.
>>
>>
>>> If this is just a dev cluster, can you verify the keys are present /
>>> absent using either a range 2i $keys query, or a key list, please?
>>>
>>
>> Unfortunately this is prod, so brute-force key list is out of the
>> question.
>>
>> Running:
>>     curl "http://127.0.0.1:8098/buckets/$bucket/index/\$keys_bin/0/z"
>>
>> Returns:
>>     {"keys":[]}
>>
>





More information about the riak-users mailing list