RIAK 1.4.6 - Mass key deletion

Simon Effenberg seffenberg at team.mobile.de
Sun Jul 20 09:24:34 EDT 2014


Hi Matthew,

so is there a awy to improve the compaction rate in Riak < 2.0 or do I
have to upgrade to 2.0 to get this?

Cheers
Simon

On Sun, Apr 06, 2014 at 06:30:30PM -0400, Matthew Von-Maszewski wrote:
>    Edgar,
>    This is indirectly related to you key deletion discussion.  I made changes
>    recently to the aggressive delete code.  The second section of the
>    following (updated) web page discusses the adjustments:
>        https://github.com/basho/leveldb/wiki/Mv-aggressive-delete
>    Matthew
>    On Apr 6, 2014, at 4:29 PM, Edgar Veiga <edgarmveiga at gmail.com> wrote:
> 
>      Matthew, thanks again for the response!
>      That said, I'll wait again for the 2.0 (and maybe buy some bigger disks
>      :)
>      Best regards
> 
>      On 6 April 2014 15:02, Matthew Von-Maszewski <matthewv at basho.com> wrote:
> 
>        Edgar,
>        In Riak 1.4, there is no advantage to using empty values versus
>        deleting.
>        leveldb is a "write once" data store.  New data for a given key never
>        physically overwrites old data for the same key.  New data "hides" the
>        old data by being in a lower level, and therefore picked first.
>        leveldb's compaction operation will remove older key/value pairs only
>        when the newer key/value is pair is part of a compaction involving
>        both new and old.  The new and the old key/value pairs must have
>        migrated to adjacent levels through normal compaction operations
>        before leveldb will see them in the same compaction.  The migration
>        could take days, weeks, or even months depending upon the size of your
>        entire dataset and the rate of incoming write operations.
>        leveldb's "delete" object is exactly the same as your empty JSON
>        object.  The delete object simply has one more flag set that allows it
>        to also be removed if and only if there is no chance for an identical
>        key to exist on a higher level.
>        I apologize that I cannot give you a more useful answer.  2.0 is on
>        the horizon.
>        Matthew
>        On Apr 6, 2014, at 7:04 AM, Edgar Veiga <edgarmveiga at gmail.com> wrote:
> 
>          Hi again!
>          Sorry to reopen this discussion, but I have another question
>          regarding the former post.
>          What if, instead of doing a mass deletion (We've already seen that
>          it will be non profitable, regarding disk space) I update all the
>          values with an empty JSON object "{}" ? Do you see any problem with
>          this? I no longer need those millions of values that are living in
>          the cluster... 
>          When the version 2.0 of riak runs stable I'll do the update and only
>          then delete those keys!
>          Best regards
> 
>          On 18 February 2014 16:32, Edgar Veiga <edgarmveiga at gmail.com>
>          wrote:
> 
>            Ok, thanks a lot Matthew.
> 
>            On 18 February 2014 16:18, Matthew Von-Maszewski
>            <matthewv at basho.com> wrote:
> 
>              Riak 2.0 is coming.  Hold your mass delete until then.  The
>              "bug" is within Google's original leveldb architecture.  Riak
>              2.0 sneaks around to get the disk space freed.
>              Matthew
>              On Feb 18, 2014, at 11:10 AM, Edgar Veiga
>              <edgarmveiga at gmail.com> wrote:
> 
>                The only/main purpose is to free disk space..
>                I was a little bit concerned regarding this operation, but now
>                with your feedback I'm tending to don't do nothing, I can't
>                risk the growing of space... 
>                Regarding the overhead I think that with a tight throttling
>                system I could control and avoid overloading the cluster.
>                Mixed feelings :S
> 
>                On 18 February 2014 15:45, Matthew Von-Maszewski
>                <matthewv at basho.com> wrote:
> 
>                  Edgar,
>                  The first "concern" I have is that leveldb's delete does not
>                  free disk space.  Others have executed mass delete
>                  operations only to discover they are now using more disk
>                  space instead of less.  Here is a discussion of the problem:
>                  https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>                  The link also describes Riak's database operation overhead.
>                   This is a second "concern".  You will need to carefully
>                  throttle your delete rate or the overhead will likely impact
>                  your production throughput.
>                  We have new code to help quicken the actual purge of deleted
>                  data in Riak 2.0.  But that release is not quite ready for
>                  production usage.
>                  What do you hope to achieve by the mass delete?
>                  Matthew
>                  On Feb 18, 2014, at 10:29 AM, Edgar Veiga
>                  <edgarmveiga at gmail.com> wrote:
> 
>                    Sorry, forgot that info!
>                    It's leveldb.
>                    Best regards
> 
>                    On 18 February 2014 15:27, Matthew Von-Maszewski
>                    <matthewv at basho.com> wrote:
> 
>                      Which Riak backend are you using:  bitcask, leveldb,
>                      multi?
> 
>                      Matthew
> 
>                      On Feb 18, 2014, at 10:17 AM, Edgar Veiga
>                      <edgarmveiga at gmail.com> wrote:
> 
>                      > Hi all!
>                      >
>                      > I have a fairly trivial question regarding mass
>                      deletion on a riak cluster, but firstly let me give you
>                      just some context. My cluster is running with riak 1.4.6
>                      on 6 machines with a ring of 256 nodes and 1Tb ssd
>                      disks.
>                      >
>                      > I need to execute a massive object deletion on a
>                      bucket, I'm talking of ~1 billion keys (The object
>                      average size is ~1Kb). I will not retrive the keys from
>                      riak because a I have a file with all of them. I'll just
>                      start a script that reads them from the file and
>                      triggers an HTTP DELETE for each one.
>                      > The cluster will continue running on production with a
>                      quite high load serving all other applications, while
>                      running this deletion.
>                      >
>                      > My question is simple, do I need to have any kind of
>                      extra concerns regarding this action? Do you advise me
>                      on taking special attention to any kind of metrics
>                      regarding riak or event the servers where it's running?
>                      >
>                      > Best regards!
>                      > _______________________________________________
>                      > riak-users mailing list
>                      > riak-users at lists.basho.com
>                      >
>                      http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-- 
Simon Effenberg | Site Op | mobile.international GmbH

Phone:    + 49. 30. 8109. 7173
M-Phone:  + 49. 151. 5266. 1558
Mail:     seffenberg at team.mobile.de
Web:      www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany

______________________________________________________
Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow
______________________________________________________





More information about the riak-users mailing list