riak-cs fails to start after reimporting Docker container

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

riak-cs fails to start after reimporting Docker container

Jean-Marc Le Roux
Hello,

here is the original github issue :

https://github.com/basho/riak_cs/issues/1329

I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8 in a Docker container.
To make the data persistent, the following directories are mounted from outside the container :
  • /var/log
  • /var/lib/riak/
Everything works fine except when I remove/reimport the container.
Even when it's the same container.
The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs look fine on those files.

Riak starts. Stanchion starts. But riak-cs won't start.
With a riak-cs concole, it looks like the problem is here :
([hidden email])1> [os_mon] memory supervisor port (memsup): Erlang has closed

=INFO REPORT==== 18-Jan-2017::09:38:31 ===
    alarm_handler: {clear,system_memory_high_watermark}
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
var/log/riak-cs/access.log.2017_01_18_09 is empty.
Here is what /var/log/riak-cs/crash.log says:
2017-01-18 09:38:31 =CRASH REPORT====
  crasher:
    initial call: application_master:init/4
    pid: <0.148.0>
    registered_name: []
    exception exit: {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
    ancestors: [<0.147.0>]
    messages: [{'EXIT',<0.149.0>,normal}]
    links: [<0.147.0>,<0.7.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 119
  neighbours:
As I understand it, there is a "notfound" exception but I have no idea what is missing...

Please advise.

Regards,

--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: (+33)6 20 56 45 78
Phone: (+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Luke Bakken
Hi Jean-Marc -

Can you provide a complete archive of the log directory? I wonder if
another file might have more information.

--
Luke Bakken
Engineer
[hidden email]

On Thu, Feb 9, 2017 at 1:58 AM, Jean-Marc Le Roux
<[hidden email]> wrote:

>
> Hello,
>
> here is the original github issue :
>
> https://github.com/basho/riak_cs/issues/1329
>
> I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8 in a Docker container.
> To make the data persistent, the following directories are mounted from outside the container :
>
> /var/log
> /var/lib/riak/
>
> Everything works fine except when I remove/reimport the container.
> Even when it's the same container.
> The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs look fine on those files.
>
> Riak starts. Stanchion starts. But riak-cs won't start.
> With a riak-cs concole, it looks like the problem is here :
>>
>> (riak-cs@127.0.0.1)1> [os_mon] memory supervisor port (memsup): Erlang has closed
>>
>> =INFO REPORT==== 18-Jan-2017::09:38:31 ===
>>     alarm_handler: {clear,system_memory_high_watermark}
>> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
>> {"Kernel pid terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
>
> var/log/riak-cs/access.log.2017_01_18_09 is empty.
> Here is what /var/log/riak-cs/crash.log says:
>>
>> 2017-01-18 09:38:31 =CRASH REPORT====
>>   crasher:
>>     initial call: application_master:init/4
>>     pid: <0.148.0>
>>     registered_name: []
>>     exception exit: {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>     ancestors: [<0.147.0>]
>>     messages: [{'EXIT',<0.149.0>,normal}]
>>     links: [<0.147.0>,<0.7.0>]
>>     dictionary: []
>>     trap_exit: true
>>     status: running
>>     heap_size: 376
>>     stack_size: 27
>>     reductions: 119
>>   neighbours:

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jean-Marc Le Roux
Hi,

I'll try to send the log archive ASAP.
Here is what I get in /var/log/riak/error.log after running riak-admin repair-2i :

2017-02-15 22:09:06.535 [error] <0.3287.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1255977969581244695331291653115555720016817029120
2017-02-15 22:09:06.535 [error] <0.3288.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1278813932664540053428224228626747642198940975104
2017-02-15 22:09:06.535 [error] <0.3289.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 479555224749202520035584085735030365824602865664
2017-02-15 22:09:06.535 [error] <0.3290.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 502391187832497878132516661246222288006726811648
2017-02-15 22:09:06.535 [error] <0.3291.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1118962191081472546749696200048404186924073353216

I tried to remove all "LOCK" files in /var/lib/riak but to no avail...
I'm guessing there is something here...

Any idea ?

2017-02-09 17:37 GMT+01:00 Luke Bakken <[hidden email]>:
Hi Jean-Marc -

Can you provide a complete archive of the log directory? I wonder if
another file might have more information.

--
Luke Bakken
Engineer
[hidden email]

On Thu, Feb 9, 2017 at 1:58 AM, Jean-Marc Le Roux
<[hidden email]> wrote:
>
> Hello,
>
> here is the original github issue :
>
> https://github.com/basho/riak_cs/issues/1329
>
> I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8 in a Docker container.
> To make the data persistent, the following directories are mounted from outside the container :
>
> /var/log
> /var/lib/riak/
>
> Everything works fine except when I remove/reimport the container.
> Even when it's the same container.
> The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs look fine on those files.
>
> Riak starts. Stanchion starts. But riak-cs won't start.
> With a riak-cs concole, it looks like the problem is here :
>>
>> ([hidden email])1> [os_mon] memory supervisor port (memsup): Erlang has closed
>>
>> =INFO REPORT==== 18-Jan-2017::09:38:31 ===
>>     alarm_handler: {clear,system_memory_high_watermark}
>> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
>> {"Kernel pid terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
>
> var/log/riak-cs/access.log.2017_01_18_09 is empty.
> Here is what /var/log/riak-cs/crash.log says:
>>
>> 2017-01-18 09:38:31 =CRASH REPORT====
>>   crasher:
>>     initial call: application_master:init/4
>>     pid: <0.148.0>
>>     registered_name: []
>>     exception exit: {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>     ancestors: [<0.147.0>]
>>     messages: [{'EXIT',<0.149.0>,normal}]
>>     links: [<0.147.0>,<0.7.0>]
>>     dictionary: []
>>     trap_exit: true
>>     status: running
>>     heap_size: 376
>>     stack_size: 27
>>     reductions: 119
>>   neighbours:



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: (+33)6 20 56 45 78
Phone: (+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jean-Marc Le Roux
Forgot to mention ACLs are alright AFAIK :

root@b4394bf1de78:/var/lib/riak# ls -la
total 52
drwxr-xr-x. 10 riak riak  179 Feb  9 23:43 .
drwxr-xr-x.  1 root root   95 Feb 15 20:48 ..
-r--------.  1 riak riak   20 Feb  9 01:00 .erlang.cookie
drwxrwxr-x. 67 riak riak 8192 Feb 15 21:31 anti_entropy
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 bitcask
drwxrwxr-x.  3 riak riak   40 Feb  9 23:42 cluster_meta
drwxrwxr-x.  2 riak riak  225 Feb 15 22:09 generated.configs
drwxrwxr-x.  2 riak riak 8192 Feb 15 22:09 kv_vnode
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 leveldb
drwxrwxr-x.  2 riak riak    6 Feb 15 22:14 riak_kv_exchange_fsm
drwxr-xr-x.  2 riak riak  186 Feb 15 22:09 ring

2017-02-15 22:13 GMT+01:00 Jean-Marc Le Roux <[hidden email]>:
Hi,

I'll try to send the log archive ASAP.
Here is what I get in /var/log/riak/error.log after running riak-admin repair-2i :

2017-02-15 22:09:06.535 [error] <0.3287.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1255977969581244695331291653115555720016817029120
2017-02-15 22:09:06.535 [error] <0.3288.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1278813932664540053428224228626747642198940975104
2017-02-15 22:09:06.535 [error] <0.3289.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 479555224749202520035584085735030365824602865664
2017-02-15 22:09:06.535 [error] <0.3290.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 502391187832497878132516661246222288006726811648
2017-02-15 22:09:06.535 [error] <0.3291.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1118962191081472546749696200048404186924073353216

I tried to remove all "LOCK" files in /var/lib/riak but to no avail...
I'm guessing there is something here...

Any idea ?

2017-02-09 17:37 GMT+01:00 Luke Bakken <[hidden email]>:
Hi Jean-Marc -

Can you provide a complete archive of the log directory? I wonder if
another file might have more information.

--
Luke Bakken
Engineer
[hidden email]

On Thu, Feb 9, 2017 at 1:58 AM, Jean-Marc Le Roux
<[hidden email]> wrote:
>
> Hello,
>
> here is the original github issue :
>
> https://github.com/basho/riak_cs/issues/1329
>
> I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8 in a Docker container.
> To make the data persistent, the following directories are mounted from outside the container :
>
> /var/log
> /var/lib/riak/
>
> Everything works fine except when I remove/reimport the container.
> Even when it's the same container.
> The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs look fine on those files.
>
> Riak starts. Stanchion starts. But riak-cs won't start.
> With a riak-cs concole, it looks like the problem is here :
>>
>> ([hidden email])1> [os_mon] memory supervisor port (memsup): Erlang has closed
>>
>> =INFO REPORT==== 18-Jan-2017::09:38:31 ===
>>     alarm_handler: {clear,system_memory_high_watermark}
>> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
>> {"Kernel pid terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
>
> var/log/riak-cs/access.log.2017_01_18_09 is empty.
> Here is what /var/log/riak-cs/crash.log says:
>>
>> 2017-01-18 09:38:31 =CRASH REPORT====
>>   crasher:
>>     initial call: application_master:init/4
>>     pid: <0.148.0>
>>     registered_name: []
>>     exception exit: {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>     ancestors: [<0.147.0>]
>>     messages: [{'EXIT',<0.149.0>,normal}]
>>     links: [<0.147.0>,<0.7.0>]
>>     dictionary: []
>>     trap_exit: true
>>     status: running
>>     heap_size: 376
>>     stack_size: 27
>>     reductions: 119
>>   neighbours:



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" target="_blank">(+33)9 72 40 17 58



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: (+33)6 20 56 45 78
Phone: (+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jean-Marc Le Roux
Hi,

inspecting the logs further, I get this in /etc/riak/console.log even before running riak-admin repair-2i :

2017-02-15 23:41:12.441 [warning] <0.714.0> Hintfile '/var/lib/riak/bitcask/205523667749658222872393179600727299639115513856/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.702.0> Hintfile '/var/lib/riak/bitcask/22835963083295358096932575511191922182123945984/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.716.0> Hintfile '/var/lib/riak/bitcask/251195593916248939066258330623111144003363405824/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.717.0> Hintfile '/var/lib/riak/bitcask/296867520082839655260123481645494988367611297792/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.700.0> Hintfile '/var/lib/riak/bitcask/91343852333181432387730302044767688728495783936/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.715.0> Hintfile '/var/lib/riak/bitcask/228359630832953580969325755111919221821239459840/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.697.0> Hintfile '/var/lib/riak/bitcask/68507889249886074290797726533575766546371837952/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.712.0> Hintfile '/var/lib/riak/bitcask/159851741583067506678528028578343455274867621888/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.719.0> Hintfile '/var/lib/riak/bitcask/342539446249430371453988632667878832731859189760/2.bitcask.hint' invalid

All of this is very surprising since I started riak-cs and riak properly.

Then at the end of console.log :

2017-02-15 23:41:13.651 [info] <0.481.0>@riak_core:wait_for_service:498 Wait complete for service riak_kv (10 seconds)
2017-02-15 23:41:13.652 [info] <0.678.0>@riak_core:wait_for_service:498 Wait complete for service riak_kv (10 seconds)
2017-02-15 23:41:13.668 [info] <0.7.0> Application yokozuna started on node '[hidden email]'
2017-02-15 23:41:13.672 [info] <0.7.0> Application cluster_info started on node '[hidden email]'
2017-02-15 23:41:13.678 [info] <0.201.0>@riak_core_capability:process_capability_changes:555 New capability: {riak_control,member_info_version} = v1
2017-02-15 23:41:13.680 [info] <0.7.0> Application riak_control started on node '[hidden email]'
2017-02-15 23:41:13.680 [info] <0.7.0> Application erlydtl started on node '[hidden email]'
2017-02-15 23:41:13.687 [info] <0.7.0> Application riak_auth_mods started on node '[hidden email]'
2017-02-15 23:41:17.714 [info] <0.474.0>@riak_core_throttle:maybe_log_throttle_change:372 Changing throttle for riak_kv/aae_throttle from undefined to 0 based on load factor 0
2017-02-15 23:41:32.719 [info] <0.2388.0>@riak_kv_index_hashtree:build_or_rehash:1055 Starting AAE tree build: 159851741583067506678528028578343455274867621888
2017-02-15 23:42:02.186 [info] <0.2388.0>@riak_kv_index_hashtree:handle_fold_keys_result:629 Finished AAE tree build: 159851741583067506678528028578343455274867621888

I assume it means riak is properly started.
So I start stanchion, then riak-cs. But I still have the exact same error...

Regards,

2017-02-15 22:16 GMT+01:00 Jean-Marc Le Roux <[hidden email]>:
Forgot to mention ACLs are alright AFAIK :

root@b4394bf1de78:/var/lib/riak# ls -la
total 52
drwxr-xr-x. 10 riak riak  179 Feb  9 23:43 .
drwxr-xr-x.  1 root root   95 Feb 15 20:48 ..
-r--------.  1 riak riak   20 Feb  9 01:00 .erlang.cookie
drwxrwxr-x. 67 riak riak 8192 Feb 15 21:31 anti_entropy
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 bitcask
drwxrwxr-x.  3 riak riak   40 Feb  9 23:42 cluster_meta
drwxrwxr-x.  2 riak riak  225 Feb 15 22:09 generated.configs
drwxrwxr-x.  2 riak riak 8192 Feb 15 22:09 kv_vnode
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 leveldb
drwxrwxr-x.  2 riak riak    6 Feb 15 22:14 riak_kv_exchange_fsm
drwxr-xr-x.  2 riak riak  186 Feb 15 22:09 ring

2017-02-15 22:13 GMT+01:00 Jean-Marc Le Roux <[hidden email]>:
Hi,

I'll try to send the log archive ASAP.
Here is what I get in /var/log/riak/error.log after running riak-admin repair-2i :

2017-02-15 22:09:06.535 [error] <0.3287.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1255977969581244695331291653115555720016817029120
2017-02-15 22:09:06.535 [error] <0.3288.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1278813932664540053428224228626747642198940975104
2017-02-15 22:09:06.535 [error] <0.3289.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 479555224749202520035584085735030365824602865664
2017-02-15 22:09:06.535 [error] <0.3290.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 502391187832497878132516661246222288006726811648
2017-02-15 22:09:06.535 [error] <0.3291.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1118962191081472546749696200048404186924073353216

I tried to remove all "LOCK" files in /var/lib/riak but to no avail...
I'm guessing there is something here...

Any idea ?

2017-02-09 17:37 GMT+01:00 Luke Bakken <[hidden email]>:
Hi Jean-Marc -

Can you provide a complete archive of the log directory? I wonder if
another file might have more information.

--
Luke Bakken
Engineer
[hidden email]

On Thu, Feb 9, 2017 at 1:58 AM, Jean-Marc Le Roux
<[hidden email]> wrote:
>
> Hello,
>
> here is the original github issue :
>
> https://github.com/basho/riak_cs/issues/1329
>
> I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8 in a Docker container.
> To make the data persistent, the following directories are mounted from outside the container :
>
> /var/log
> /var/lib/riak/
>
> Everything works fine except when I remove/reimport the container.
> Even when it's the same container.
> The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs look fine on those files.
>
> Riak starts. Stanchion starts. But riak-cs won't start.
> With a riak-cs concole, it looks like the problem is here :
>>
>> ([hidden email])1> [os_mon] memory supervisor port (memsup): Erlang has closed
>>
>> =INFO REPORT==== 18-Jan-2017::09:38:31 ===
>>     alarm_handler: {clear,system_memory_high_watermark}
>> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
>> {"Kernel pid terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
>
> var/log/riak-cs/access.log.2017_01_18_09 is empty.
> Here is what /var/log/riak-cs/crash.log says:
>>
>> 2017-01-18 09:38:31 =CRASH REPORT====
>>   crasher:
>>     initial call: application_master:init/4
>>     pid: <0.148.0>
>>     registered_name: []
>>     exception exit: {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>     ancestors: [<0.147.0>]
>>     messages: [{'EXIT',<0.149.0>,normal}]
>>     links: [<0.147.0>,<0.7.0>]
>>     dictionary: []
>>     trap_exit: true
>>     status: running
>>     heap_size: 376
>>     stack_size: 27
>>     reductions: 119
>>   neighbours:



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" target="_blank">(+33)9 72 40 17 58



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" target="_blank">(+33)9 72 40 17 58



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: (+33)6 20 56 45 78
Phone: (+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jon Brisbin-4
I haven't tried CS in a container yet. Could you provide the Dockerfiles and compose files or the commands you use to start the services? 

jb

On Wed, Feb 15, 2017 at 4:49 PM Jean-Marc Le Roux <[hidden email]> wrote:
Hi,

inspecting the logs further, I get this in /etc/riak/console.log even before running riak-admin repair-2i :

2017-02-15 23:41:12.441 [warning] <0.714.0> Hintfile '/var/lib/riak/bitcask/205523667749658222872393179600727299639115513856/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.702.0> Hintfile '/var/lib/riak/bitcask/22835963083295358096932575511191922182123945984/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.716.0> Hintfile '/var/lib/riak/bitcask/251195593916248939066258330623111144003363405824/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.717.0> Hintfile '/var/lib/riak/bitcask/296867520082839655260123481645494988367611297792/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.700.0> Hintfile '/var/lib/riak/bitcask/91343852333181432387730302044767688728495783936/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.715.0> Hintfile '/var/lib/riak/bitcask/228359630832953580969325755111919221821239459840/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.697.0> Hintfile '/var/lib/riak/bitcask/68507889249886074290797726533575766546371837952/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.712.0> Hintfile '/var/lib/riak/bitcask/159851741583067506678528028578343455274867621888/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.719.0> Hintfile '/var/lib/riak/bitcask/342539446249430371453988632667878832731859189760/2.bitcask.hint' invalid

All of this is very surprising since I started riak-cs and riak properly.

Then at the end of console.log :

2017-02-15 23:41:13.651 [info] <0.481.0>@riak_core:wait_for_service:498 Wait complete for service riak_kv (10 seconds)
2017-02-15 23:41:13.652 [info] <0.678.0>@riak_core:wait_for_service:498 Wait complete for service riak_kv (10 seconds)
2017-02-15 23:41:13.668 [info] <0.7.0> Application yokozuna started on node '[hidden email]'
2017-02-15 23:41:13.672 [info] <0.7.0> Application cluster_info started on node '[hidden email]'
2017-02-15 23:41:13.678 [info] <0.201.0>@riak_core_capability:process_capability_changes:555 New capability: {riak_control,member_info_version} = v1
2017-02-15 23:41:13.680 [info] <0.7.0> Application riak_control started on node '[hidden email]'
2017-02-15 23:41:13.680 [info] <0.7.0> Application erlydtl started on node '[hidden email]'
2017-02-15 23:41:13.687 [info] <0.7.0> Application riak_auth_mods started on node '[hidden email]'
2017-02-15 23:41:17.714 [info] <0.474.0>@riak_core_throttle:maybe_log_throttle_change:372 Changing throttle for riak_kv/aae_throttle from undefined to 0 based on load factor 0
2017-02-15 23:41:32.719 [info] <0.2388.0>@riak_kv_index_hashtree:build_or_rehash:1055 Starting AAE tree build: 159851741583067506678528028578343455274867621888
2017-02-15 23:42:02.186 [info] <0.2388.0>@riak_kv_index_hashtree:handle_fold_keys_result:629 Finished AAE tree build: 159851741583067506678528028578343455274867621888

I assume it means riak is properly started.
So I start stanchion, then riak-cs. But I still have the exact same error...

Regards,

2017-02-15 22:16 GMT+01:00 Jean-Marc Le Roux <[hidden email]>:
Forgot to mention ACLs are alright AFAIK :

root@b4394bf1de78:/var/lib/riak# ls -la
total 52
drwxr-xr-x. 10 riak riak  179 Feb  9 23:43 .
drwxr-xr-x.  1 root root   95 Feb 15 20:48 ..
-r--------.  1 riak riak   20 Feb  9 01:00 .erlang.cookie
drwxrwxr-x. 67 riak riak 8192 Feb 15 21:31 anti_entropy
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 bitcask
drwxrwxr-x.  3 riak riak   40 Feb  9 23:42 cluster_meta
drwxrwxr-x.  2 riak riak  225 Feb 15 22:09 generated.configs
drwxrwxr-x.  2 riak riak 8192 Feb 15 22:09 kv_vnode
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 leveldb
drwxrwxr-x.  2 riak riak    6 Feb 15 22:14 riak_kv_exchange_fsm
drwxr-xr-x.  2 riak riak  186 Feb 15 22:09 ring

2017-02-15 22:13 GMT+01:00 Jean-Marc Le Roux <[hidden email]>:
Hi,

I'll try to send the log archive ASAP.
Here is what I get in /var/log/riak/error.log after running riak-admin repair-2i :

2017-02-15 22:09:06.535 [error] <0.3287.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1255977969581244695331291653115555720016817029120
2017-02-15 22:09:06.535 [error] <0.3288.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1278813932664540053428224228626747642198940975104
2017-02-15 22:09:06.535 [error] <0.3289.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 479555224749202520035584085735030365824602865664
2017-02-15 22:09:06.535 [error] <0.3290.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 502391187832497878132516661246222288006726811648
2017-02-15 22:09:06.535 [error] <0.3291.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1118962191081472546749696200048404186924073353216

I tried to remove all "LOCK" files in /var/lib/riak but to no avail...
I'm guessing there is something here...

Any idea ?

2017-02-09 17:37 GMT+01:00 Luke Bakken <[hidden email]>:
Hi Jean-Marc -

Can you provide a complete archive of the log directory? I wonder if
another file might have more information.

--
Luke Bakken
Engineer
[hidden email]

On Thu, Feb 9, 2017 at 1:58 AM, Jean-Marc Le Roux
<[hidden email]> wrote:
>
> Hello,
>
> here is the original github issue :
>
> https://github.com/basho/riak_cs/issues/1329
>
> I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8 in a Docker container.
> To make the data persistent, the following directories are mounted from outside the container :
>
> /var/log
> /var/lib/riak/
>
> Everything works fine except when I remove/reimport the container.
> Even when it's the same container.
> The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs look fine on those files.
>
> Riak starts. Stanchion starts. But riak-cs won't start.
> With a riak-cs concole, it looks like the problem is here :
>>
>> ([hidden email])1> [os_mon] memory supervisor port (memsup): Erlang has closed
>>
>> =INFO REPORT==== 18-Jan-2017::09:38:31 ===
>>     alarm_handler: {clear,system_memory_high_watermark}
>> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
>> {"Kernel pid terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
>
> var/log/riak-cs/access.log.2017_01_18_09 is empty.
> Here is what /var/log/riak-cs/crash.log says:
>>
>> 2017-01-18 09:38:31 =CRASH REPORT====
>>   crasher:
>>     initial call: application_master:init/4
>>     pid: <0.148.0>
>>     registered_name: []
>>     exception exit: {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>     ancestors: [<0.147.0>]
>>     messages: [{'EXIT',<0.149.0>,normal}]
>>     links: [<0.147.0>,<0.7.0>]
>>     dictionary: []
>>     trap_exit: true
>>     status: running
>>     heap_size: 376
>>     stack_size: 27
>>     reductions: 119
>>   neighbours:



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" class="gmail_msg" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" class="gmail_msg" target="_blank">(+33)9 72 40 17 58



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" class="gmail_msg" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" class="gmail_msg" target="_blank">(+33)9 72 40 17 58



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" class="gmail_msg" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" class="gmail_msg" target="_blank">(+33)9 72 40 17 58
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Toby Corkindale-2
I tried quite hard to get Riak to work reliably in a Docker container, in a long-term-use kind of way.
Riak would never shutdown cleanly, though, and so at startup there would always be lots of lock files left around that had to be deleted first.

Riak is not well-behaved after a rough shutdown -- whether in a Docker container, or running on bare metal. Tends to require sysadmin intervention to clean things up.

If you're running it in a Docker container, you need to figure out a way to capture the incoming SIGTERM and then use that to shutdown Riak cleanly. I never got that far.
I had a start-up script that cleaned out lock files and hash trees and the like, but even after all that, the Dockerised Riak proved problematic. (And getting all the Erlang/OTP clustering networking to work was also painful)

Good luck,
Toby

On Thu, 16 Feb 2017 at 10:03 Jon Brisbin <[hidden email]> wrote:
I haven't tried CS in a container yet. Could you provide the Dockerfiles and compose files or the commands you use to start the services? 

jb

On Wed, Feb 15, 2017 at 4:49 PM Jean-Marc Le Roux <[hidden email]> wrote:
Hi,

inspecting the logs further, I get this in /etc/riak/console.log even before running riak-admin repair-2i :

2017-02-15 23:41:12.441 [warning] <0.714.0> Hintfile '/var/lib/riak/bitcask/205523667749658222872393179600727299639115513856/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.702.0> Hintfile '/var/lib/riak/bitcask/22835963083295358096932575511191922182123945984/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.716.0> Hintfile '/var/lib/riak/bitcask/251195593916248939066258330623111144003363405824/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.717.0> Hintfile '/var/lib/riak/bitcask/296867520082839655260123481645494988367611297792/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.700.0> Hintfile '/var/lib/riak/bitcask/91343852333181432387730302044767688728495783936/2.bitcask.hint' invalid
2017-02-15 23:41:12.441 [warning] <0.715.0> Hintfile '/var/lib/riak/bitcask/228359630832953580969325755111919221821239459840/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.697.0> Hintfile '/var/lib/riak/bitcask/68507889249886074290797726533575766546371837952/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.712.0> Hintfile '/var/lib/riak/bitcask/159851741583067506678528028578343455274867621888/2.bitcask.hint' invalid
2017-02-15 23:41:12.442 [warning] <0.719.0> Hintfile '/var/lib/riak/bitcask/342539446249430371453988632667878832731859189760/2.bitcask.hint' invalid

All of this is very surprising since I started riak-cs and riak properly.

Then at the end of console.log :

2017-02-15 23:41:13.651 [info] <0.481.0>@riak_core:wait_for_service:498 Wait complete for service riak_kv (10 seconds)
2017-02-15 23:41:13.652 [info] <0.678.0>@riak_core:wait_for_service:498 Wait complete for service riak_kv (10 seconds)
2017-02-15 23:41:13.668 [info] <0.7.0> Application yokozuna started on node '[hidden email]'
2017-02-15 23:41:13.672 [info] <0.7.0> Application cluster_info started on node '[hidden email]'
2017-02-15 23:41:13.678 [info] <0.201.0>@riak_core_capability:process_capability_changes:555 New capability: {riak_control,member_info_version} = v1
2017-02-15 23:41:13.680 [info] <0.7.0> Application riak_control started on node '[hidden email]'
2017-02-15 23:41:13.680 [info] <0.7.0> Application erlydtl started on node '[hidden email]'
2017-02-15 23:41:13.687 [info] <0.7.0> Application riak_auth_mods started on node '[hidden email]'
2017-02-15 23:41:17.714 [info] <0.474.0>@riak_core_throttle:maybe_log_throttle_change:372 Changing throttle for riak_kv/aae_throttle from undefined to 0 based on load factor 0
2017-02-15 23:41:32.719 [info] <0.2388.0>@riak_kv_index_hashtree:build_or_rehash:1055 Starting AAE tree build: 159851741583067506678528028578343455274867621888
2017-02-15 23:42:02.186 [info] <0.2388.0>@riak_kv_index_hashtree:handle_fold_keys_result:629 Finished AAE tree build: 159851741583067506678528028578343455274867621888

I assume it means riak is properly started.
So I start stanchion, then riak-cs. But I still have the exact same error...

Regards,

2017-02-15 22:16 GMT+01:00 Jean-Marc Le Roux <[hidden email]>:
Forgot to mention ACLs are alright AFAIK :

root@b4394bf1de78:/var/lib/riak# ls -la
total 52
drwxr-xr-x. 10 riak riak  179 Feb  9 23:43 .
drwxr-xr-x.  1 root root   95 Feb 15 20:48 ..
-r--------.  1 riak riak   20 Feb  9 01:00 .erlang.cookie
drwxrwxr-x. 67 riak riak 8192 Feb 15 21:31 anti_entropy
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 bitcask
drwxrwxr-x.  3 riak riak   40 Feb  9 23:42 cluster_meta
drwxrwxr-x.  2 riak riak  225 Feb 15 22:09 generated.configs
drwxrwxr-x.  2 riak riak 8192 Feb 15 22:09 kv_vnode
drwxrwxr-x. 66 riak riak 8192 Feb  9 23:42 leveldb
drwxrwxr-x.  2 riak riak    6 Feb 15 22:14 riak_kv_exchange_fsm
drwxr-xr-x.  2 riak riak  186 Feb 15 22:09 ring

2017-02-15 22:13 GMT+01:00 Jean-Marc Le Roux <[hidden email]>:
Hi,

I'll try to send the log archive ASAP.
Here is what I get in /var/log/riak/error.log after running riak-admin repair-2i :

2017-02-15 22:09:06.535 [error] <0.3287.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1255977969581244695331291653115555720016817029120
2017-02-15 22:09:06.535 [error] <0.3288.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1278813932664540053428224228626747642198940975104
2017-02-15 22:09:06.535 [error] <0.3289.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 479555224749202520035584085735030365824602865664
2017-02-15 22:09:06.535 [error] <0.3290.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 502391187832497878132516661246222288006726811648
2017-02-15 22:09:06.535 [error] <0.3291.0>@riak_kv_2i_aae:repair_partition:297 Failed to acquire hashtree lock on partition 1118962191081472546749696200048404186924073353216

I tried to remove all "LOCK" files in /var/lib/riak but to no avail...
I'm guessing there is something here...

Any idea ?

2017-02-09 17:37 GMT+01:00 Luke Bakken <[hidden email]>:
Hi Jean-Marc -

Can you provide a complete archive of the log directory? I wonder if
another file might have more information.

--
Luke Bakken
Engineer
[hidden email]

On Thu, Feb 9, 2017 at 1:58 AM, Jean-Marc Le Roux
<[hidden email]> wrote:
>
> Hello,
>
> here is the original github issue :
>
> https://github.com/basho/riak_cs/issues/1329
>
> I'm using riak-cs 2.1.1-1.el6 with stanchion 1.5.0-1.el6 on CentOS 6.8 in a Docker container.
> To make the data persistent, the following directories are mounted from outside the container :
>
> /var/log
> /var/lib/riak/
>
> Everything works fine except when I remove/reimport the container.
> Even when it's the same container.
> The riak data is here in /var/lib/riak (bitcask and leveldb stuff). ACLs look fine on those files.
>
> Riak starts. Stanchion starts. But riak-cs won't start.
> With a riak-cs concole, it looks like the problem is here :
>>
>> ([hidden email])1> [os_mon] memory supervisor port (memsup): Erlang has closed
>>
>> =INFO REPORT==== 18-Jan-2017::09:38:31 ===
>>     alarm_handler: {clear,system_memory_high_watermark}
>> [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
>> {"Kernel pid terminated",application_controller,"{application_start_failure,riak_cs,{notfound,{riak_cs_app,start,[normal,[]]}}}"}
>
> var/log/riak-cs/access.log.2017_01_18_09 is empty.
> Here is what /var/log/riak-cs/crash.log says:
>>
>> 2017-01-18 09:38:31 =CRASH REPORT====
>>   crasher:
>>     initial call: application_master:init/4
>>     pid: <0.148.0>
>>     registered_name: []
>>     exception exit: {{notfound,{riak_cs_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>     ancestors: [<0.147.0>]
>>     messages: [{'EXIT',<0.149.0>,normal}]
>>     links: [<0.147.0>,<0.7.0>]
>>     dictionary: []
>>     trap_exit: true
>>     status: running
>>     heap_size: 376
>>     stack_size: 27
>>     reductions: 119
>>   neighbours:



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" class="gmail_msg" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" class="gmail_msg" target="_blank">(+33)9 72 40 17 58



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" class="gmail_msg" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" class="gmail_msg" target="_blank">(+33)9 72 40 17 58



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" class="gmail_msg" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" class="gmail_msg" target="_blank">(+33)9 72 40 17 58
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Luca Favatella
On 6 March 2017 at 03:49, Toby Corkindale <[hidden email]> wrote:
>
> I tried quite hard to get Riak to work reliably in a Docker container, in a long-term-use kind of way.
> Riak would never shutdown cleanly, though, and so at startup there would always be lots of lock files left around that had to be deleted first.
>
> Riak is not well-behaved after a rough shutdown -- whether in a Docker container, or running on bare metal. Tends to require sysadmin intervention to clean things up.
>
> If you're running it in a Docker container, you need to figure out a way to capture the incoming SIGTERM and then use that to shutdown Riak cleanly. I never got that far.

Hi,

FYI

I opened a ticket for tracking wish to use SIGTERM for stopping Riak KV:
  https://github.com/basho/riak/issues/899


Thanks for sharing your experiences.

Regards
Luca

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jean-Marc Le Roux
Hello,

I finally found the problem : the riak and riak-cs config files where stored in my app container, not in my persistent data container (where the riak data is).
Whenever I upgraded the app container to a newly generated one, the conf files and especially the admin user credentials did not match.

Yet, two things are rather surprising :
  • AFAIK all the provided riak + Docker examples did not bother to properly store those conf files in a Docker volume, so they suffer from the same problem and cannot be used as is ;
  • IMHO the riak-cs error is completely misleading and useless : "notfound" should be "admin_user_not_found" or "invalid_admin_user".
Please feel free to ask me any question so that I can help understanding this and fix this misleading and - quite frankly completely useless - "notfound" error message.

I hope this helps.

Regards,

2017-03-08 15:04 GMT+01:00 Luca Favatella <[hidden email]>:
On 6 March 2017 at 03:49, Toby Corkindale <[hidden email]> wrote:
>
> I tried quite hard to get Riak to work reliably in a Docker container, in a long-term-use kind of way.
> Riak would never shutdown cleanly, though, and so at startup there would always be lots of lock files left around that had to be deleted first.
>
> Riak is not well-behaved after a rough shutdown -- whether in a Docker container, or running on bare metal. Tends to require sysadmin intervention to clean things up.
>
> If you're running it in a Docker container, you need to figure out a way to capture the incoming SIGTERM and then use that to shutdown Riak cleanly. I never got that far.

Hi,

FYI

I opened a ticket for tracking wish to use SIGTERM for stopping Riak KV:
  https://github.com/basho/riak/issues/899


Thanks for sharing your experiences.

Regards
Luca



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: (+33)6 20 56 45 78
Phone: (+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jon Brisbin-4
FWIW- All work done so far on Riak-in-Docker is for purposes of testing and ad-hoc cluster creation and NOT for use in "production" environments (for any definition of the word "production"). Things like exposing the conf files on a volume which are more production-oriented haven't been in scope of the work yet; providing a self-contained image that can be bootstrapped OOTB has been the focus. That said, we're definitely discussing how we might support Docker in a more robust way. There's lots to do, beyond just initial config and data on volumes. There's the IP addressing and networking issues of bridges to consider, among other things. 

Might be good to make sure this info is encapsulated into a GitHub issue so we don't loose track of it.

jb

On Wed, Mar 8, 2017 at 8:16 AM Jean-Marc Le Roux <[hidden email]> wrote:
Hello,

I finally found the problem : the riak and riak-cs config files where stored in my app container, not in my persistent data container (where the riak data is).
Whenever I upgraded the app container to a newly generated one, the conf files and especially the admin user credentials did not match.

Yet, two things are rather surprising :
  • AFAIK all the provided riak + Docker examples did not bother to properly store those conf files in a Docker volume, so they suffer from the same problem and cannot be used as is ;
  • IMHO the riak-cs error is completely misleading and useless : "notfound" should be "admin_user_not_found" or "invalid_admin_user".
Please feel free to ask me any question so that I can help understanding this and fix this misleading and - quite frankly completely useless - "notfound" error message.

I hope this helps.

Regards,

2017-03-08 15:04 GMT+01:00 Luca Favatella <[hidden email]>:
On 6 March 2017 at 03:49, Toby Corkindale <[hidden email]> wrote:
>
> I tried quite hard to get Riak to work reliably in a Docker container, in a long-term-use kind of way.
> Riak would never shutdown cleanly, though, and so at startup there would always be lots of lock files left around that had to be deleted first.
>
> Riak is not well-behaved after a rough shutdown -- whether in a Docker container, or running on bare metal. Tends to require sysadmin intervention to clean things up.
>
> If you're running it in a Docker container, you need to figure out a way to capture the incoming SIGTERM and then use that to shutdown Riak cleanly. I never got that far.

Hi,

FYI

I opened a ticket for tracking wish to use SIGTERM for stopping Riak KV:
  https://github.com/basho/riak/issues/899


Thanks for sharing your experiences.

Regards
Luca



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: <a href="tel:+33%206%2020%2056%2045%2078" value="+33620564578" class="gmail_msg" target="_blank">(+33)6 20 56 45 78
Phone: <a href="tel:+33%209%2072%2040%2017%2058" value="+33972401758" class="gmail_msg" target="_blank">(+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jean-Marc Le Roux

2017-03-08 15:53 GMT+01:00 Jon Brisbin <[hidden email]>:
FWIW- All work done so far on Riak-in-Docker is for purposes of testing and ad-hoc cluster creation and NOT for use in "production" environments (for any definition of the word "production"). Things like exposing the conf files on a volume which are more production-oriented haven't been in scope of the work yet; providing a self-contained image that can be bootstrapped OOTB has been the focus. That said, we're definitely discussing how we might support Docker in a more robust way. There's lots to do, beyond just initial config and data on volumes. There's the IP addressing and networking issues of bridges to consider, among other things. 

For now we use a single riak node so for us Docker is not much of an issue.
We successfully backuped/restored riak/riak-cs by backuping/restoring the corresponding Docker volume container, which makes things a lot easier when you have multiples databases and stuff like that.

Might be good to make sure this info is encapsulated into a GitHub issue so we don't loose track of it.

In the end, the ONE thing to fix is the cumbersome error message. A propre error message would have saved me days of "work"...
Should I create a new issue for that ?

Regards,

--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: (+33)6 20 56 45 78
Phone: (+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jon Brisbin-4
Might be good to make sure this info is encapsulated into a GitHub issue so we don't loose track of it.

In the end, the ONE thing to fix is the cumbersome error message. A propre error message would have saved me days of "work"...
Should I create a new issue for that ?

Please do.

jb 

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak-cs fails to start after reimporting Docker container

Jean-Marc Le Roux

2017-03-08 16:21 GMT+01:00 Jon Brisbin <[hidden email]>:
Might be good to make sure this info is encapsulated into a GitHub issue so we don't loose track of it.

In the end, the ONE thing to fix is the cumbersome error message. A propre error message would have saved me days of "work"...
Should I create a new issue for that ?

Please do.

jb 



--
Jean-Marc Le Roux


Founder and CEO of Aerys (http://aerys.in)

Blog: http://blogs.aerys.in/jeanmarc-leroux
Cell: (+33)6 20 56 45 78
Phone: (+33)9 72 40 17 58

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com