bitcask merges & deletions

Johnny Tan johnnydtan at gmail.com
Thu Jun 16 20:07:45 EDT 2016


Hm, definitely not a cronjob. I'll look at our app and see if there's
anything that does something like that there.

On Wed, Jun 15, 2016 at 9:10 PM, Luke Bakken <lbakken at basho.com> wrote:

> Hi Johnny,
>
> Since this seems to happen regularly on one node on your cluster (not
> necessarily the same node), do you have a repetitive process that
> performs a *lot* of updates or deletes on a single key that could be
> correlated to these merges?
> --
> Luke Bakken
> Engineer
> lbakken at basho.com
>
>
> On Wed, Jun 15, 2016 at 10:22 AM, Johnny Tan <johnnydtan at gmail.com> wrote:
> > We're running riak-1.4.2
> >
> > Every few weeks, we have a riak node that starts to slowly fill up on
> disk
> > space for several days, and then suddenly gain that space back again.
> >
> > In looking into this more today, I think I see what's going on.
> >
> > Per the console.log on a node that it's happening to right now, there
> are an
> > unusually large amount of merges happening right now. There are 6 total
> > nodes in our cluster, it's only happening to this node today. (In
> previous
> > weeks, it's been other nodes, but it's always been one node at a time.)
> >
> > Normally, we get 50-70 merges per day per node (according to various
> nodes'
> > console.log, including the node in question). Yesterday and today, the
> node
> > in question has several hundred merges happening.
> >
> > When I look inside the bitcask directory, I see a lot of files with this
> set
> > of permissions:
> > -rwSrw-r--
> >
> > My understanding is that those are files marked for deletions after
> bitcask
> > merging.
> >
> > The number of those files is currently growing, and from a spot-check,
> they
> > indeed match up as the files that have been merged.
> >
> > So it seems the two are related: a lot of merges are happening, which
> then
> > causes a large number of files to be marked for deletion, and those
> marked
> > files are piling up and not getting deleted for some reason.
> >
> > If I don't do anything, those files eventually get deleted, and
> everything
> > is good again for another couple weeks until it happens to another node.
> But
> > the disk usage does get high enough to alert us, and obviously we don't
> want
> > it to get anywhere near 100%.
> >
> >
> > I'm trying to figure out why there are times when this happens. One
> thing I
> > noticed is a difference in the merge log entries.
> >
> > Here's one from a "normal" day, nearly all the entries for that day are
> > roughly this same length and same amount of time merging:
> > 2016-06-10 05:27:39.426 UTC [info] <0.15230.160> Merged
> >
> {["/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/84000.bitcask.data","/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/83999.bitcask.data"],[]}
> > in 11.902028 seconds.
> >
> > But here's one from today on the problematic node:
> > 2016-06-15 17:13:40.626 UTC [info] <0.17903.500> Merged
> >
> {["/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83633.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83632.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83631.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83630.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83629.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83628.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83627.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83626.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83625.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83624.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83623.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83622.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83621.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83620.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83619.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83618.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83617.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83616.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83615.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83614.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83613.bitcask.data","/var/lib/riak/bitcask/12331420064979493372343590776043637978346930...",...],...}
> > in 220.186043 seconds.
> >
> > It's not just that it takes 20x longer to merge, but it seems to be
> doing a
> > lot more files at once.
> >
> > What is going on?
> >
> > I'm not sure how much of the app.config is relevant, but I'll at least
> paste
> > just the bitcask and merge sections for now:
> >         {bitcask, [
> >                 {data_root, "/var/lib/riak/bitcask"},
> >                 {dead_bytes_merge_trigger, 268435456},
> >                 {dead_bytes_threshold, 67108864},
> >                 {frag_merge_trigger, 60},
> >                 {frag_threshold, 40},
> >                 {io_mode, erlang},
> >                 {max_file_size, 1073741824},
> >                 {small_file_threshold, 134217728}
> >         ]},
> >         {merge_index, [
> >                 {buffer_rollover_size, 1048576},
> >                 {data_root, "/var/lib/riak/merge_index"},
> >                 {max_compact_segments, 20}
> >         ]},
> >
> >
> > Thanks for any insight,
> > johnny
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20160616/58a2bc9c/attachment.html>


More information about the riak-users mailing list