key garbage collection

Justin Karneges justin at affinix.com
Thu Nov 3 02:39:20 EDT 2011


Hi,

In the "atomically updating multiple keys" thread I mentioned the issue of 
dirtying the database with orphan keys and thought that topic was worth 
starting a new thread.

Say you have an operation that requires creating two keys, A and B, and you 
succeed in creating A but fail in creating B.  How do you delete A after the 
fact?  I have two ideas:

1) Run periodic MapReduce operations that do full db scans looking for garbage 
keys and deleting them (this seems really horrible, but I'll admit I'm new to 
distributed DBs and MapReduce).

2) Maintain cleanup logs that explicitly identify possibly offending keys, for 
optimized cleanup processing.

For the 2) case, here is a pattern I have considered:

cleanup_id = new_uuid() + "_" + timestamp_string
A_value_id = new_uuid()
B_value_id = new_uuid()
set /cleanup/{cleanup_id} key with data [A, A_value_id], [B, B_value_id]
set A key with metadata:
  value_id = A_value_id
  cleanup_id = cleanup_id
set B key with metadata:
  value_id = B_value_id
  cleanup_id = cleanup_id
delete /cleanup/{cleanup_id} key <--- commit succeeds here
set A key to remove cleanup_id metadata
set B key to remove cleanup_id metadata

The rule is that when getting keys A or B, the read is only considered 
successful if either:

1) There is no cleanup_id metadata on the key.
2) There *is* cleanup_id metadata but no key exists with that name.

It is not strictly necessary to remove the cleanup_id metadata from the keys, 
but doing so will optimize reads (since there won't need to be a followup 
check for the cleanup key each time).  So it's okay if we fail to remove this 
metadata, it can be removed by some other client eventually.

So far so good.  Now for handling cleanup.  Periodically, we scan the 
"cleanup" bucket for keys to process.  Since keys only exist in this bucket at 
the moment of a write (they are deleted immediately afterwards), in practice 
there should hardly be any keys in here at any single point in time.  We're 
talking single digits here.  Much better than a full db scan to find garbage 
keys.  Also, the keys to process can be narrowed down by time (e.g. > 5 
minutes ago) based on the key name.

For each old cleanup log, the processor would delete the keys listed in the 
cleanup information but *only if* the value_id for each key matches what is in 
the value_id metadata of the actual key.  This way if somebody else writes to 
the key in the meantime, we don't delete it.  Provided all deletes succeed (or 
skip) then the cleanup key itself is deleted.

I wonder if anyone else is doing something like this.  I also wonder if there 
is a flaw in this design.  Since the lack of the cleanup key can indicate a 
successful commit, I wonder if there might be a problem with eventual 
consistency whereby some other node sees A and B but not the cleanup key 
because it hasn't propagated yet, and therefore thinks the keys are valid when 
in reality they aren't.  Maybe causal consistency ensures this isn't a 
problem, since the cleanup key is written before A and B?

Justin




More information about the riak-users mailing list