Issues with garbage collection on RiakCS

Alex Berghage aberghage at
Fri Aug 22 13:09:33 EDT 2014

Hi David & Raina,

I took a look over your configuration files — depending on the level of
disk use you're at, GC may in fact be working correctly. The areas you can
look to in order to tune the speed at which disk space is reclaimed are the
garbage collector's leeway_seconds, and Riak's Bitcask backend merge

Riak CS stores objects spread across multiple parts in the backing Riak
cluster, with the mapping between CS objects and Riak object blocks being
maintained by a "Manifest" object — effectively a list of blocks + some
metadata. When you issue a delete for an object via the S3 API to CS, that
object's manifest is written to a separate Riak bucket to wait for garbage
collection, where it waits for a preset leeway period. This leeway period
allows any reads currently in progress from that object at the time of
deletion to complete before its data is removed from the backing store. On
this cluster, that leeway is currently set to 24 hours. Once the leeway
period for the object has expired, it becomes eligible for deletion by the
periodic garbage collection process in Riak CS.

The Garbage Collector is a periodic process that runs on one Riak CS node,
making queries to the Riak cluster and issuing deletions for data blocks
and manifests that correspond to CS objects which have been deleted. It
does this by running a secondary index query over the garbage collection
bucket to collect manifests which have already made it through the above
process, and whose leeway time has elapsed. GC then sequentially issues
deletions for each of the data blocks listed in each of the manifests
collected for deletion. This process happens on a configurable period, set
to 15 minutes in this cluster. When large amounts of data are deleted at
once it can take some time for this garbage collection process to work
through the backlog of data blocks that require deletion — when this is the
case it can be difficult to gauge the garbage collector's progress in
issuing deletions. Once this process is completed, for a given CS object,
the manifest is deleted as well and the deletion process is finished from
Riak CS' perspective.

These deletions, as they arrive to the backing Riak cluster, are recorded
as tombstone objects. From here we'll focus on the path that Riak CS data
blocks take to become free space, as they typically consume orders of
magnitude more space than the metadata. Data blocks are stored in a backend
called Bitcask, which uses a periodic process known as "Merging" to
consolidate data files and collapse deleted objects. Which data files to
merge, and when, is dictated by thresholds for the total amount of dead
data that's acceptable per file, and the ratio of dead data to live. In a
typical installation, with deletions spread out over time, merging is not a
particularly lengthy process and is done regularly. When deletions are
batched up, creating a lot of work per merge, it can take in the
neighborhood of 50-90 minutes to complete a merge, at the outside. More
importantly, however, these merges require an amount of space on disk
that's roughly to the size of the partition being merged to safely
construct new datafiles. For this reason, merging is done one partition at
a time (per node). You can check what that size threshold is for your
deployment with `du -sh /var/vcap/store/riak/rel/lib/bitcask/*`, according
to your configuration files, and you can can configure these merge
thresholds using the tuning options described here

Beyond this tuning, if you believe garbage collection is itself failing to
keep up with the deletion load on your cluster I recommend upgrading to
Riak 1.4.10 and CS 1.5. Riak CS 1.5 allows for greater parallelism in the
GC process, which should accelerate its rate of progress, and there have
been a number of performance improvements and bugfixes in the Riak releases
leading up to 1.4.10 from which you may benefit. Please take a look at the
Riak <> and CS
release notes for a full list. I'd also suggest monitoring your CS logs for
"Garbage collection completed" events, which indicate that GC has finished
a run, or errors/crashes including the string `riak_cs_gc_d` which indicate
the garbage collection daemon is being prevented from completing a batch.
You may also have some success adjusting the following timeouts in your
Riak CS app.config:

{riakc, [
            {get_index_call_timeout, 300000},
            {get_index_timeout, 300000},
            {mapred_timeout, 300000}

I've included a quick summary of these timeouts' function below:

get_index_*: Sets a 5-minute calculation period for work to be performed by
the garbage collection process Riak CS uses to query Riak for deleted data.
This is a prophylactic measure meant to prevent timeout problems should the
user with high usage begin rapidly deleting that data.
mapred_timeout: Sets a 5-minute calculation period for the map/reduce jobs
Riak CS runs against Riak, which should ease the process for users with
large amounts of usage.

Alex Berghage

> On Mon, Aug 18, 2014 at 10:34 AM, David Sabeti <dsabeti at> wrote:
> > Hi all,
> >
> > Our team at Cloud Foundry is building a RiakCS service for CF users and
> one
> > of our deployments is seeing an issue with deleting objects from the
> > blobstore.
> >
> > We were seeing that our disk usage was approaching 100%, so we deleted
> some
> > of the stale objects in the blobstore using s3cmd. If we run `s3cmd du`,
> it
> > seems that we successfully freed up space, but when we ran `df` inside
> the
> > RiakCS host, we still saw that our disk usage was close to 100%.
> >
> > We understand now that Riak will remove deleted keys asynchronously, but
> we
> > haven't succeeded in configuring GC so that it is more responsive to
> > deletions, despite having tried tweaking several parameters. On Friday,
> we
> > uploaded several files and deleted them, hoping to see that they were
> gone
> > from the disk on Monday. When we came back after the weekend, we saw that
> > garbage collection still had not occurred. If it helps, you can look at
> our
> > configuration for Riak and RiakCS.
> >
> > Has anyone else encountered this issue, where garbage collection appears
> to
> > never occur? Would be great to get help configuring RiakCS so that GC
> > happens more often. Maybe there is a way to run GC manually when disk is
> > filling up?
> >
> > Thanks,
> > David & Raina
> > CF Services Team
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list