riak-cs fails to start after reimporting Docker container

Toby Corkindale toby at dryft.net
Sun Mar 5 22:49:18 EST 2017


I tried quite hard to get Riak to work reliably in a Docker container, in a
long-term-use kind of way.
Riak would never shutdown cleanly, though, and so at startup there would
always be lots of lock files left around that had to be deleted first.

Riak is not well-behaved after a rough shutdown -- whether in a Docker
container, or running on bare metal. Tends to require sysadmin intervention
to clean things up.

If you're running it in a Docker container, you need to figure out a way to
capture the incoming SIGTERM and then use that to shutdown Riak cleanly. I
never got that far.
I had a start-up script that cleaned out lock files and hash trees and the
like, but even after all that, the Dockerised Riak proved problematic. (And
getting all the Erlang/OTP clustering networking to work was also painful)

Good luck,
Toby

On Thu, 16 Feb 2017 at 10:03 Jon Brisbin <jbrisbin at basho.com> wrote:

> I haven't tried CS in a container yet. Could you provide the Dockerfiles
> and compose files or the commands you use to start the services?
>
> jb
>
> On Wed, Feb 15, 2017 at 4:49 PM Jean-Marc Le Roux <
> jeanmarc.leroux at aerys.in> wrote:
>
> Hi,
>
> inspecting the logs further, I get this in /etc/riak/console.log even
> before running riak-admin repair-2i :
>
> 2017-02-15 23:41:12.441 [warning] <0.714.0> Hintfile
> '/var/lib/riak/bitcask/205523667749658222872393179600727299639115513856/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.441 [warning] <0.702.0> Hintfile
> '/var/lib/riak/bitcask/22835963083295358096932575511191922182123945984/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.441 [warning] <0.716.0> Hintfile
> '/var/lib/riak/bitcask/251195593916248939066258330623111144003363405824/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.441 [warning] <0.717.0> Hintfile
> '/var/lib/riak/bitcask/296867520082839655260123481645494988367611297792/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.441 [warning] <0.700.0> Hintfile
> '/var/lib/riak/bitcask/91343852333181432387730302044767688728495783936/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.441 [warning] <0.715.0> Hintfile
> '/var/lib/riak/bitcask/228359630832953580969325755111919221821239459840/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.442 [warning] <0.697.0> Hintfile
> '/var/lib/riak/bitcask/68507889249886074290797726533575766546371837952/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.442 [warning] <0.712.0> Hintfile
> '/var/lib/riak/bitcask/159851741583067506678528028578343455274867621888/2.bitcask.hint'
> invalid
> 2017-02-15 23:41:12.442 [warning] <0.719.0> Hintfile
> '/var/lib/riak/bitcask/342539446249430371453988632667878832731859189760/2.bitcask.hint'
> invalid
>
> All of this is very surprising since I started riak-cs and riak properly.
>
> Then at the end of console.log :
>
> 2017-02-15 23:41:13.651 [info] <0.481.0>@riak_core:wait_for_service:498
> Wait complete for service riak_kv (10 seconds)
> 2017-02-15 23:41:13.652 [info] <0.678.0>@riak_core:wait_for_service:498
> Wait complete for service riak_kv (10 seconds)
> 2017-02-15 23:41:13.668 [info] <0.7.0> Application yokozuna started on
> node 'riak at 127.0.0.1'
> 2017-02-15 23:41:13.672 [info] <0.7.0> Application cluster_info started on
> node 'riak at 127.0.0.1'
> 2017-02-15 23:41:13.678 [info]
> <0.201.0>@riak_core_capability:process_capability_changes:555 New
> capability: {riak_control,member_info_version} = v1
> 2017-02-15 23:41:13.680 [info] <0.7.0> Application riak_control started on
> node 'riak at 127.0.0.1'
> 2017-02-15 23:41:13.680 [info] <0.7.0> Application erlydtl started on node
> 'riak at 127.0.0.1'
> 2017-02-15 23:41:13.687 [info] <0.7.0> Application riak_auth_mods started
> on node 'riak at 127.0.0.1'
> 2017-02-15 23:41:17.714 [info]
> <0.474.0>@riak_core_throttle:maybe_log_throttle_change:372 Changing
> throttle for riak_kv/aae_throttle from undefined to 0 based on load factor 0
> 2017-02-15 23:41:32.719 [info]
> <0.2388.0>@riak_kv_index_hashtree:build_or_rehash:1055 Starting AAE tree
> build: 159851741583067506678528028578343455274867621888
> 2017-02-15 23:42:02.186 [info]
> <0.2388.0>@riak_kv_index_hashtree:handle_fold_keys_result:629 Finished AAE
> tree build: 159851741583067506678528028578343455274867621888
>
> I assume it means riak is properly started.
> So I start stanchion, then riak-cs. But I still have the exact same
> error...
>
> Regards,
>
> 2017-02-15 22:16 GMT+01:00 Jean-Marc Le Roux <jeanmarc.leroux at aerys.in>:
>
> Forgot to mention ACLs are alright AFAIK :
>
> root at b4394bf1de78:/var/lib/riak# ls -la
> total 52
> drwxr-xr-x. 10 riak riak  179 Feb  9 23:43 .
> drwxr-xr-x.  1 root root   95 Feb 15 20:48 ..
> -r--------.  1 riak riak   20 Feb  9 01:00 .erlang.cookie
> drwxrwxr-x. 67 riak riak 8192 Feb 15 21:31 anti_entropy
> drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 bitcask
> drwxrwxr-x.  3 riak riak   40 Feb  9 23:42 cluster_meta
> drwxrwxr-x.  2 riak riak  225 Feb 15 22:09 generated.configs
> drwxrwxr-x.  2 riak riak 8192 Feb 15 22:09 kv_vnode
> drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 leveldb
> drwxrwxr-x.  2 riak riak    6 Feb 15 22:14 riak_kv_exchange_fsm
> drwxr-xr-x.  2 riak riak  186 Feb 15 22:09 ring
>
> 2017-02-15 22:13 GMT+01:00 Jean-Marc Le Roux <jeanmarc.leroux at aerys.in>:
>
> Hi,
>
> I'll try to send the log archive ASAP.
> Here is what I get in /var/log/riak/error.log after running riak-admin
> repair-2i :
>
> 2017-02-15 22:09:06.535 [error]
> <0.3287.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree
> lock on partition 1255977969581244695331291653115555720016817029120
> 2017-02-15 22:09:06.535 [error]
> <0.3288.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree
> lock on partition 1278813932664540053428224228626747642198940975104
> 2017-02-15 22:09:06.535 [error]
> <0.3289.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree
> lock on partition 479555224749202520035584085735030365824602865664
> 2017-02-15 22:09:06.535 [error]
> <0.3290.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree
> lock on partition 502391187832497878132516661246222288006726811648
> 2017-02-15 22:09:06.535 [error]
> <0.3291.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree
> lock on partition 1118962191081472546749696200048404186924073353216
>
> I tried to remove all "LOCK" files in /var/lib/riak but to no avail...
> I'm guessing there is something here...
>
> Any idea ?
>
> 2017-02-09 17:37 GMT+01:00 Luke Bakken <lbakken at basho.com>:
>
> Hi Jean-Marc -
>
> Can you provide a complete archive of the log directory? I wonder if
> another file might have more information.
>
> --
> Luke Bakken
> Engineer
> lbakken at basho.com
>
> On Thu, Feb 9, 2017 at 1:58 AM, Jean-Marc Le Roux
> <jeanmarc.leroux at aerys.in> wrote:
> >
> > Hello,
> >
> > here is the original github issue :
> >
> > https://github.com/basho/riak_cs/issues/1329
> >
> > I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8
> in a Docker container.
> > To make the data persistent, the following directories are mounted from
> outside the container :
> >
> > /var/log
> > /var/lib/riak/
> >
> > Everything works fine except when I remove/reimport the container.
> > Even when it's the same container.
> > The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs
> look fine on those files.
> >
> > Riak starts. Stanchion starts. But riak-cs won't start.
> > With a riak-cs concole, it looks like the problem is here :
> >>
> >> (riak-cs at 127.0.0.1)1> [os_mon] memory supervisor port (memsup): Erlang
> has closed
> >>
> >> =INFO REPORT==== 18-Jan-2017::09:38:31 ===
> >>     alarm_handler: {clear,system_memory_high_watermark}
> >> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
> >> {"Kernel pid
> terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
> >
> > var/log/riak-cs/access.log.2017_01_18_09 is empty.
> > Here is what /var/log/riak-cs/crash.log says:
> >>
> >> 2017-01-18 09:38:31 =CRASH REPORT====
> >>   crasher:
> >>     initial call: application_master:init/4
> >>     pid: <0.148.0>
> >>     registered_name: []
> >>     exception exit:
> {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
> >>     ancestors: [<0.147.0>]
> >>     messages: [{'EXIT',<0.149.0>,normal}]
> >>     links: [<0.147.0>,<0.7.0>]
> >>     dictionary: []
> >>     trap_exit: true
> >>     status: running
> >>     heap_size: 376
> >>     stack_size: 27
> >>     reductions: 119
> >>   neighbours:
>
>
>
>
> --
> *Jean-Marc Le Roux*
>
>
> Founder and CEO of Aerys (http://aerys.in)
>
> Blog: http://blogs.aerys.in/jeanmarc-leroux
> Cell: (+33)6 20 56 45 78 <+33%206%2020%2056%2045%2078>
> Phone: (+33)9 72 40 17 58 <+33%209%2072%2040%2017%2058>
>
>
>
>
> --
> *Jean-Marc Le Roux*
>
>
> Founder and CEO of Aerys (http://aerys.in)
>
> Blog: http://blogs.aerys.in/jeanmarc-leroux
> Cell: (+33)6 20 56 45 78 <+33%206%2020%2056%2045%2078>
> Phone: (+33)9 72 40 17 58 <+33%209%2072%2040%2017%2058>
>
>
>
>
> --
> *Jean-Marc Le Roux*
>
>
> Founder and CEO of Aerys (http://aerys.in)
>
> Blog: http://blogs.aerys.in/jeanmarc-leroux
> Cell: (+33)6 20 56 45 78 <+33%206%2020%2056%2045%2078>
> Phone: (+33)9 72 40 17 58 <+33%209%2072%2040%2017%2058>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170306/bb0a2cc4/attachment-0002.html>


More information about the riak-users mailing list