Missing SST File

Shane McEwan shane at mcewan.id.au
Mon Jan 14 06:00:55 EST 2013


So, I figured out two ways to fix the missing SST file problem. 
Described here for future generations.

Solution 1:

Shut down Riak on the node with the missing file.
Delete (or move sideways) the LevelDB partition with the missing file.
Start Riak.
Repair the KV Indexes[1] which forces a partition handoff from the 
replicas (I don't know if this step is needed or if Riak will notice the 
empty partition and fix itself automatically).

Solution 2:

Shut down Riak on the node with the missing file.
Follow the instructions[2] to initiate a LevelDB repair. (This seems to 
rebuild the MANIFEST file based on the SST files.)
Start Riak.
Because the data that was in the non-existent SST file is still missing 
you'll need to:
Repair the KV Indexes[1] which forces a partition handoff from the 
replicas (I don't know if this step is needed or if Riak will fix itself 
automatically).

[1] http://docs.basho.com/riak/1.2.1/cookbooks/Repairing-KV-Indexes/
[2] https://gist.github.com/2834473

Hope this helps!

Shane.

On 11/01/13 13:47, Shane McEwan wrote:
> Thanks Matthew.
>
> We're running version 1.2.1.
>
> I was actually following the Repair KV Indexes[1] instructions which
> triggered the problem. I was doing the repair mostly out of curiosity to
> see what it did. I was thinking of using it as a sanity check for Riak
> backups.
>
> I assume there's a different sort of repair I can run?
>
> [1] http://docs.basho.com/riak/1.2.1/cookbooks/Repairing-KV-Indexes/
>
> On 11/01/13 12:48, Matthew Von-Maszewski wrote:
>> What version of Riak?
>>
>> Likely you need to take the node offline and run repair.
>>
>> Matthew
>>
>>
>> On Jan 11, 2013, at 4:50 AM, Shane McEwan <shane at mcewan.id.au> wrote:
>>
>>> G'day!
>>>
>>> I posted this to the LevelDB mailing list with little success.
>>> Apologies if you've already seen this from there.
>>>
>>> We've started getting errors in a LevelDB LOG file about a missing
>>> SST file:
>>>
>>> 2013/01/10-15:08:12.714525 7fb0767fa700 Compacting 14 at 0 + 7 at 1 files
>>> 2013/01/10-15:08:25.121147 7fb0767fa700 compacted to: files[ 14 7 50
>>> 105 0 0 0 ]
>>> 2013/01/10-15:08:25.121488 7fb0767fa700 Delete type=2 #111404
>>> 2013/01/10-15:08:25.147976 7fb0767fa700 Compaction error: IO error:
>>> /data/riak/leveldb/902020541790166644828836732692080926193895866368/006558.sst:
>>> No such file or directory
>>>
>>> I assume it means we've got an SST file listed in the MANIFEST file
>>> that doesn't exist anymore. The SST in question doesn't exist in any
>>> of the snapshots I have around the time it was likely to have been
>>> created.
>>>
>>> I saw mention of a bug fixed in the latest LevelDB[1] that could
>>> cause what we're seeing except that we haven't run out of disk space
>>> so I'm not sure we've hit that. I'm less interested in HOW it
>>> happened since we've moved to different hardware since then.
>>>
>>> We haven't noticed any missing data in our database (perhaps Riak
>>> replicas are helping there?) and even if there is something missing
>>> the nature of our data means that we can probably live without it.
>>>
>>> My question is, can we remove the offending file's entry out of the
>>> MANIFEST file somehow? Or will it sort itself out? Currently our idle
>>> test database is spinning at 100% CPU trying to compact a file that
>>> doesn't exist.
>>>
>>> [1]
>>> https://groups.google.com/forum/#!msg/leveldb/Kc9JxuIUu5A/9P0N9RL4ar8J
>>>
>>> Any advice would be greatly appreciated. Thanks!
>>>
>>> Shane.
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list