Object not found after successful PUT on S3 API

Daniel Miller dmiller at dimagi.com
Mon Mar 6 12:21:18 EST 2017


> Would be good to know the riak version

Riak 2.1.1
Riak CS 2.1.0
Stanchion 2.1.0

> why the dvv_enabled bucket property is set to false, please?

Looks like that's the default
<http://docs.basho.com/riak/kv/2.2.0/learn/concepts/buckets/#dvv-enabled>.
I haven't changed it.

 > Also, is there multi-datacentre replication involved?

no

> Do you re-use your keys, for example, have the keys in question been
created, deleted, and then re-created?

no

Thank you for the prompt follow-up.

Daniel


On Mon, Mar 6, 2017 at 10:38 AM, Russell Brown <russell.brown at icloud.com>
wrote:

> Hi,
> Would be good to know the riak version, and why the dvv_enabled bucket
> property is set to false, please? Also, is there multi-datacentre
> replication involved? Do you re-use your keys, for example, have the keys
> in question been created, deleted, and then re-created?
>
> Cheers
>
> Russell
>
> On 6 Mar 2017, at 15:07, Daniel Miller <dmiller at dimagi.com> wrote:
>
> > I recently had another case of a disappearing object. This time the
> object was successfully PUT, and (unlike the previous cases reported in
> this thread) for a period of time GETs were also successful. Then GETs
> started 404ing for no apparent reason. There are no errors in the logs to
> indicate that anything unusual happened. This is quite disconcerting. Is it
> normal that Riak CS just loses track of objects? At this point we are using
> CS as primary object storage, meaning we do not have the data stored in
> another database so it's critical that the data is not randomly lost.
> >
> > In the CS access logs I see
> >
> > # all prior GET requests for this object succeeding like this one. This
> is the last successful GET request:
> > [28/Feb/2017:14:42:35 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b...
> HTTP/1.0" 200 14923 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic
> Botocore/1.4.53 Resource"
> > ...
> > # all GET requests for this object are now failing like this one (the
> first 404):
> > [02/Mar/2017:08:36:11 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b...
> HTTP/1.0" 404 240 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic
> Botocore/1.4.53 Resource"
> >
> > The object name has been elided for readability. I do not know when this
> object was PUT into the cluster because I only have logs for the past
> month. Is there any way to dig further into Riak or Riak CS data to
> determine if the object content is actually completely lost or if there are
> any other details that might explain why it is now missing? Could I
> increase some logging parameters to get more information about what is
> going wrong when something like this happens?
> >
> > I have searched the logs for other 404 responses but found none (other
> than the two reported earlier), so this is the 3rd known missing object in
> the cluster. We retain logs for one month only (I'm increasing this now
> because of this issue), so it is possible that other objects have also gone
> missing, but I cannot see them since the logs have been truncated.
> >
> > The cluster now has 7 nodes instead of 9 (see earlier emails in this
> thread), and the riak storage backend is now leveldb instead of multi. I
> have attached config file templates for riak, raik-cs and stanchion (these
> are deployed with ansible).
> >
> > Bucket properties:
> > {
> >   "props": {
> >     "notfound_ok": true,
> >     "n_val": 3,
> >     "last_write_wins": false,
> >     "allow_mult": true,
> >     "dvv_enabled": false,
> >     "name": "blobdb",
> >     "r": "quorum",
> >     "precommit": [],
> >     "old_vclock": 86400,
> >     "dw": "quorum",
> >     "rw": "quorum",
> >     "small_vclock": 50,
> >     "write_once": false,
> >     "basic_quorum": false,
> >     "big_vclock": 50,
> >     "chash_keyfun": {
> >       "fun": "chash_std_keyfun",
> >       "mod": "riak_core_util"
> >     },
> >     "postcommit": [],
> >     "pw": 0,
> >     "w": "quorum",
> >     "young_vclock": 20,
> >     "pr": 0,
> >     "linkfun": {
> >       "fun": "mapreduce_linkfun",
> >       "mod": "riak_kv_wm_link_walker"
> >     }
> >   }
> > }
> >
> > I'll be happy to provide more context to help troubleshoot this issue.
> >
> > Thanks in advance for any help you can provide.
> >
> > Daniel
> >
> >
> > On Tue, Feb 14, 2017 at 11:52 AM, Daniel Miller <dmiller at dimagi.com>
> wrote:
> > Hi Luke,
> >
> > Sorry for the late response and thanks for following up. I haven't seen
> it happen since. At this point I'm going to wait and see if it happens
> again and hopefully get more details about what might be causing it.
> >
> > Daniel
> >
> > On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <lbakken at basho.com> wrote:
> > Hi Daniel -
> >
> > I don't have any ideas at this point. Has this scenario happened again?
> >
> > --
> > Luke Bakken
> > Engineer
> > lbakken at basho.com
> >
> >
> > On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <dmiller at dimagi.com>
> wrote:
> > > Thanks for the quick response, Luke.
> > >
> > > There is nothing unusual about the keys. The format is a name + UUID +
> some
> > > other random URL-encoded charaters, like most other keys in our
> cluster.
> > >
> > > There are no errors near the time of the incident in any of the logs
> (the
> > > last [error] is from over a month before). I see lots of messages like
> this
> > > in console.log:
> > >
> > > /var/log/riak/console.log
> > > 2017-01-20 15:38:10.184 [info]
> > > <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys
> during
> > > active anti-entropy exchange of
> > > {776422744832042175295707567380525354192214163456,3} between
> > > {776422744832042175295707567380525354192214163456,'riak-
> fake3 at fake3.fake.com'}
> > > and
> > > {822094670998632891489572718402909198556462055424,'riak-
> fake9 at fake9.fake.com'}
> > > 2017-01-20 15:40:39.640 [info]
> > > <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys
> during
> > > active anti-entropy exchange of
> > > {936274486415109681974235595958868809467081785344,3} between
> > > {959110449498405040071168171470060731649205731328,'riak-
> fake3 at fake3.fake.com'}
> > > and
> > > {981946412581700398168100746981252653831329677312,'riak-
> fake5 at fake5.fake.com'}
> > > 2017-01-20 15:46:40.918 [info]
> > > <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys
> during
> > > active anti-entropy exchange of
> > > {662242929415565384811044689824565743281594433536,3} between
> > > {685078892498860742907977265335757665463718379520,'riak-
> fake3 at fake3.fake.com'}
> > > and
> > > {707914855582156101004909840846949587645842325504,'riak-
> fake6 at fake6.fake.com'}
> > > 2017-01-20 15:48:25.597 [info]
> > > <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys
> during
> > > active anti-entropy exchange of
> > > {776422744832042175295707567380525354192214163456,3} between
> > > {776422744832042175295707567380525354192214163456,'riak-
> fake3 at fake3.fake.com'}
> > > and
> > > {799258707915337533392640142891717276374338109440,'riak-
> fake0 at fake0.fake.com'}
> > >
> > > Thanks!
> > > Daniel
> > >
> > >
> > >
> > > On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <lbakken at basho.com>
> wrote:
> > >>
> > >> Hi Daniel -
> > >>
> > >> This is a strange scenario. I recommend looking at all of the log
> > >> files for "[error]" or other entries at about the same time as these
> > >> PUTs or 404 responses.
> > >>
> > >> Is there anything unusual about the key being used?
> > >> --
> > >> Luke Bakken
> > >> Engineer
> > >> lbakken at basho.com
> > >>
> > >>
> > >> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <dmiller at dimagi.com>
> wrote:
> > >> > I have a 9-node Riak CS cluster that has been working flawlessly for
> > >> > about 3
> > >> > months. The cluster configuration, including backend and bucket
> > >> > parameters
> > >> > such as N-value are using default settings. I'm using the S3 API to
> > >> > communicate with the cluster.
> > >> >
> > >> > Within the past week I had an issue where two objects were PUT
> resulting
> > >> > in
> > >> > a 200 (success) response, but all subsequent GET requests for those
> two
> > >> > keys
> > >> > return status of 404 (not found). Other than the fact that they are
> now
> > >> > missing, there was nothing out of the ordinary with these
> particular to
> > >> > PUTs. Maybe I'm missing something, but this seems like a scenario
> that
> > >> > should never happen. All information included here about PUTs and
> GETs
> > >> > comes
> > >> > from reviewing the CS access logs. Both objects were PUT on the same
> > >> > node,
> > >> > however GET requests returning 404 have been observed on all nodes.
> > >> > There is
> > >> > plenty of other traffic on the cluster involving GETs and PUTs that
> are
> > >> > not
> > >> > failing. I'm unsure of how to troubleshoot further to find out what
> may
> > >> > have
> > >> > happened to those objects and why they are now missing. What is the
> best
> > >> > approach to figure out why an object that was successfully PUT
> seems to
> > >> > be
> > >> > missing?
> > >> >
> > >> > Thanks!
> > >> > Daniel Miller
> > >> >
> > >> > _______________________________________________
> > >> > riak-users mailing list
> > >> > riak-users at lists.basho.com
> > >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >> >
> > >
> > >
> >
> >
> > <config-files.zip>_______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170306/e39efaaf/attachment-0002.html>


More information about the riak-users mailing list