"Dead" files in bitcask or something

Alexander Sicular siculars at gmail.com
Tue Aug 17 15:52:26 EDT 2010

I'm having a discussion with dizzyd in the irc about this. The compaction is not triggered by time or by and particular number of records overwritten, but rather the total number of dead bytes and fragmentation percentage as listed in the default config here, http://github.com/basho/bitcask/blob/master/ebin/bitcask.app.

check http://irclogger.com/riak/2010-08-17 for the latest.


On Aug 17, 2010, at 11:59 AM, Dmitry Demeshchuk wrote:

> We've been running Riak at production for 3 weeks and database just
> kept growing. Even more time for our test server. Well, it was an
> older Riak, 0.12.0.
> I've been running 0.12.1 for several hours and still no compaction though...
> On Tue, Aug 17, 2010 at 7:54 PM, Alexander Sicular <siculars at gmail.com> wrote:
>> Bitcask is a write only log (wol) that eats disk (by keeping all updates)
>> until a compaction phase that reclaims disk at some defined interval.
>> -Alexander
>> @siculars on twitter
>> http://siculars.posterous.com
>> Sent from my iPhone
>> On Aug 17, 2010, at 11:27, Dmitry Demeshchuk <demeshchuk at gmail.com> wrote:
>>> Greetings.
>>> This problem has already been discussed in IRC a bit.
>>> I use Riak 0.12.1 (have been using 0.12.0 but then updated to the
>>> latest version and got the same problem) with bitcask storage.
>>> All Riak settings are default, i.e., all buckets are
>>> default-configured (allow_mult=false), replication is 3x. Currently
>>> Riak is run at a single machine. This problem is reproduced on
>>> different machines with different Riak clusters brought up.
>>> Though the total database records size doesn't grow, update operations
>>> (I'll describe them in details later) make the total size of the
>>> "data/bitcask" folder. For example, I made a database backup on our
>>> test server and the backup size was 2.5MB. But the size of the
>>> "data/bitcask" folder was 17GB!
>>> Careful investigation showed that the entire database size on the disk
>>> is performed when Riak update operation is performed, even when the
>>> value during update was exactly the same.
>>> The update operation is like this:
>>> RiakObject = RiakClient:get(Bucket, Key, 1),
>>> OldValue = riak_object:get_value(RiakObject),
>>> NewValue = do_something(),
>>> NewRiakObject = riak_object:update_value(RiakObject, NewValue),
>>> RiakClient:put(NewRiakObject, 1).
>>> And it appeared that even if I make NewValue exactly the same as
>>> OldValue, this update operation increases the database size of the
>>> disk. Still, the entire size of this Riak object is the same.
>>> I thought that maybe I could do something wrong with data operating,
>>> and there's some data I miss. But, again, backup file is very small,
>>> much smaller then the disk space occupied by database.
>>> If I do list_buckets or list_keys, these operations work desperately
>>> slow but finally they return the right values, without any garbage.
>>> Values of the Riak objects are okay as well.
>>> When I had a look at data files, it appeared that *.bitcask.data are
>>> the files that keep growing.
>>> That's all I found for now.
>>> Any clues?
>>> --
>>> Best regards,
>>> Dmitry Demeshchuk
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> -- 
> Best regards,
> Dmitry Demeshchuk

More information about the riak-users mailing list