Riak in Docker - Error folding keys - incomplete_hint

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Riak in Docker - Error folding keys - incomplete_hint

Toby Corkindale-2
Hi,
I've been working on getting Riak to run inside Docker containers - in a multi-machine cluster. (Previous work I've seen has only run Riak as a cluster all on the same machine.)
I thought I had it cracked, although I tripped up on the existing issue with Riak and lockfiles[1]. But the nodes have been generating an awful lot of errors like the below, and I wondered if anyone here can give me an explanation? (And, is it a problem?)

2015-10-21 01:19:23.567 [error] <0.24495.0> Error folding keys for "/var/lib/riak/bitcask.1h/2283596
30832953580969325755111919221821239459840/2.bitcask.data": {incomplete_hint,4}

1: Related issues to the lockfiles --
I note that many are closed, but the problem still exists, and is particularly triggered by using Docker and stopping/killing Riak more violently than it likes.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Riak in Docker - Error folding keys - incomplete_hint

Toby Corkindale-2
Anyone?

I note that after 24 hours (on a very lightly loaded test cluster) I'm still seeing these scroll by a lot - 600 an hour per node.
Really curious to know if this is expected behaviour or if this is resulting from some kind of node corruption.

Cheers
Toby



On Wed, 21 Oct 2015 at 12:23 Toby Corkindale <[hidden email]> wrote:
Hi,
I've been working on getting Riak to run inside Docker containers - in a multi-machine cluster. (Previous work I've seen has only run Riak as a cluster all on the same machine.)
I thought I had it cracked, although I tripped up on the existing issue with Riak and lockfiles[1]. But the nodes have been generating an awful lot of errors like the below, and I wondered if anyone here can give me an explanation? (And, is it a problem?)

2015-10-21 01:19:23.567 [error] <0.24495.0> Error folding keys for "/var/lib/riak/bitcask.1h/2283596
30832953580969325755111919221821239459840/2.bitcask.data": {incomplete_hint,4}

1: Related issues to the lockfiles --
I note that many are closed, but the problem still exists, and is particularly triggered by using Docker and stopping/killing Riak more violently than it likes.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Riak in Docker - Error folding keys - incomplete_hint

Hector Castro
Can't say I've paid enough attention to the logs in my single-machine
Riak within Docker setups to confirm.

Do you have the container image definitions somewhere public? That may
help someone reproduce the issue. Also, did you ensure that the Riak
data directory is setup as a Docker volume?

Other things that come to mind:

- What OS is the Docker host running?
- What storage driver are you using for Docker?
- What file system is the Docker data directory using?

--
Hector


On Thu, Oct 22, 2015 at 2:27 AM, Toby Corkindale <[hidden email]> wrote:

> Anyone?
>
> I note that after 24 hours (on a very lightly loaded test cluster) I'm still
> seeing these scroll by a lot - 600 an hour per node.
> Really curious to know if this is expected behaviour or if this is resulting
> from some kind of node corruption.
>
> Cheers
> Toby
>
>
>
> On Wed, 21 Oct 2015 at 12:23 Toby Corkindale <[hidden email]> wrote:
>>
>> Hi,
>> I've been working on getting Riak to run inside Docker containers - in a
>> multi-machine cluster. (Previous work I've seen has only run Riak as a
>> cluster all on the same machine.)
>> I thought I had it cracked, although I tripped up on the existing issue
>> with Riak and lockfiles[1]. But the nodes have been generating an awful lot
>> of errors like the below, and I wondered if anyone here can give me an
>> explanation? (And, is it a problem?)
>>
>> 2015-10-21 01:19:23.567 [error] <0.24495.0> Error folding keys for
>> "/var/lib/riak/bitcask.1h/2283596
>> 30832953580969325755111919221821239459840/2.bitcask.data":
>> {incomplete_hint,4}
>>
>> 1: Related issues to the lockfiles --
>> I note that many are closed, but the problem still exists, and is
>> particularly triggered by using Docker and stopping/killing Riak more
>> violently than it likes.
>> https://github.com/basho/bitcask/issues/163 (closed)
>> https://github.com/basho/riak/issues/535 (open)
>> https://github.com/basho/bitcask/issues/167 (closed)
>> https://github.com/basho/bitcask/issues/99 (closed)
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Riak in Docker - Error folding keys - incomplete_hint

Toby Corkindale-2
Hi Hector,
You can see the Dockerfile here:

It's a work in progress, but also, not that involved.

Ubuntu 14.04 is used as both the docker host, and the docker container.
It's on the btrfs storage driver. (I've had too many issues with the other two)
The Riak data directory is a volume, and is mounted to an external, persistent location. (Which is also btrfs)

I suspect there's an issue around Riak shutting down uncleanly when the docker container is stopped.
I have already had to add this to the start-up each time:
find /var/lib/riak -name "bitcask.*.lock" -delete

So it's clear that Riak is getting killed rather than shutting down cleanly; but even so, I'd hope that Riak would cope with that, rather than getting into a permanent state of throwing errors.

Toby


On Fri, 23 Oct 2015 at 00:01 Hector Castro <[hidden email]> wrote:
Can't say I've paid enough attention to the logs in my single-machine
Riak within Docker setups to confirm.

Do you have the container image definitions somewhere public? That may
help someone reproduce the issue. Also, did you ensure that the Riak
data directory is setup as a Docker volume?

Other things that come to mind:

- What OS is the Docker host running?
- What storage driver are you using for Docker?
- What file system is the Docker data directory using?

--
Hector


On Thu, Oct 22, 2015 at 2:27 AM, Toby Corkindale <[hidden email]> wrote:
> Anyone?
>
> I note that after 24 hours (on a very lightly loaded test cluster) I'm still
> seeing these scroll by a lot - 600 an hour per node.
> Really curious to know if this is expected behaviour or if this is resulting
> from some kind of node corruption.
>
> Cheers
> Toby
>
>
>
> On Wed, 21 Oct 2015 at 12:23 Toby Corkindale <[hidden email]> wrote:
>>
>> Hi,
>> I've been working on getting Riak to run inside Docker containers - in a
>> multi-machine cluster. (Previous work I've seen has only run Riak as a
>> cluster all on the same machine.)
>> I thought I had it cracked, although I tripped up on the existing issue
>> with Riak and lockfiles[1]. But the nodes have been generating an awful lot
>> of errors like the below, and I wondered if anyone here can give me an
>> explanation? (And, is it a problem?)
>>
>> 2015-10-21 01:19:23.567 [error] <0.24495.0> Error folding keys for
>> "/var/lib/riak/bitcask.1h/2283596
>> 30832953580969325755111919221821239459840/2.bitcask.data":
>> {incomplete_hint,4}
>>
>> 1: Related issues to the lockfiles --
>> I note that many are closed, but the problem still exists, and is
>> particularly triggered by using Docker and stopping/killing Riak more
>> violently than it likes.
>> https://github.com/basho/bitcask/issues/163 (closed)
>> https://github.com/basho/riak/issues/535 (open)
>> https://github.com/basho/bitcask/issues/167 (closed)
>> https://github.com/basho/bitcask/issues/99 (closed)
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Riak in Docker - Error folding keys - incomplete_hint

Toby Corkindale-2
Quick follow-up: As a bit of a hack, deleting all the .hint files prior to each start-up does resolve the errors, and immediately results in a whole lot of Bitcask merges happening.
But that doesn't strike me as a good long-term fix.

On Fri, 23 Oct 2015 at 10:52 Toby Corkindale <[hidden email]> wrote:
Hi Hector,
You can see the Dockerfile here:

It's a work in progress, but also, not that involved.

Ubuntu 14.04 is used as both the docker host, and the docker container.
It's on the btrfs storage driver. (I've had too many issues with the other two)
The Riak data directory is a volume, and is mounted to an external, persistent location. (Which is also btrfs)

I suspect there's an issue around Riak shutting down uncleanly when the docker container is stopped.
I have already had to add this to the start-up each time:
find /var/lib/riak -name "bitcask.*.lock" -delete

So it's clear that Riak is getting killed rather than shutting down cleanly; but even so, I'd hope that Riak would cope with that, rather than getting into a permanent state of throwing errors.

Toby


On Fri, 23 Oct 2015 at 00:01 Hector Castro <[hidden email]> wrote:
Can't say I've paid enough attention to the logs in my single-machine
Riak within Docker setups to confirm.

Do you have the container image definitions somewhere public? That may
help someone reproduce the issue. Also, did you ensure that the Riak
data directory is setup as a Docker volume?

Other things that come to mind:

- What OS is the Docker host running?
- What storage driver are you using for Docker?
- What file system is the Docker data directory using?

--
Hector


On Thu, Oct 22, 2015 at 2:27 AM, Toby Corkindale <[hidden email]> wrote:
> Anyone?
>
> I note that after 24 hours (on a very lightly loaded test cluster) I'm still
> seeing these scroll by a lot - 600 an hour per node.
> Really curious to know if this is expected behaviour or if this is resulting
> from some kind of node corruption.
>
> Cheers
> Toby
>
>
>
> On Wed, 21 Oct 2015 at 12:23 Toby Corkindale <[hidden email]> wrote:
>>
>> Hi,
>> I've been working on getting Riak to run inside Docker containers - in a
>> multi-machine cluster. (Previous work I've seen has only run Riak as a
>> cluster all on the same machine.)
>> I thought I had it cracked, although I tripped up on the existing issue
>> with Riak and lockfiles[1]. But the nodes have been generating an awful lot
>> of errors like the below, and I wondered if anyone here can give me an
>> explanation? (And, is it a problem?)
>>
>> 2015-10-21 01:19:23.567 [error] <0.24495.0> Error folding keys for
>> "/var/lib/riak/bitcask.1h/2283596
>> 30832953580969325755111919221821239459840/2.bitcask.data":
>> {incomplete_hint,4}
>>
>> 1: Related issues to the lockfiles --
>> I note that many are closed, but the problem still exists, and is
>> particularly triggered by using Docker and stopping/killing Riak more
>> violently than it likes.
>> https://github.com/basho/bitcask/issues/163 (closed)
>> https://github.com/basho/riak/issues/535 (open)
>> https://github.com/basho/bitcask/issues/167 (closed)
>> https://github.com/basho/bitcask/issues/99 (closed)
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Error folding keys - incomplete_hint

Toby Corkindale-2
I thought I'd follow up on this again, a long time later.
We gave up on the Dockerised version of Riak.
But I notice we're getting an awful lot of these incomplete_hint errors on the regular, non-docker, cluster now.

We had a sudden power failure in that server room recently, so there would have been unclean Riak shutdowns. I guessed those were the cause of the issues with the Docker version, years ago, so I'm wondering if the same thing happened here?

Should Riak be better at recovering from rough shutdowns? Or is this another issue altogether?

-Toby

On Fri, 23 Oct 2015 at 11:10 Toby Corkindale <[hidden email]> wrote:
Quick follow-up: As a bit of a hack, deleting all the .hint files prior to each start-up does resolve the errors, and immediately results in a whole lot of Bitcask merges happening.
But that doesn't strike me as a good long-term fix.

On Fri, 23 Oct 2015 at 10:52 Toby Corkindale <[hidden email]> wrote:
Hi Hector,
You can see the Dockerfile here:

It's a work in progress, but also, not that involved.

Ubuntu 14.04 is used as both the docker host, and the docker container.
It's on the btrfs storage driver. (I've had too many issues with the other two)
The Riak data directory is a volume, and is mounted to an external, persistent location. (Which is also btrfs)

I suspect there's an issue around Riak shutting down uncleanly when the docker container is stopped.
I have already had to add this to the start-up each time:
find /var/lib/riak -name "bitcask.*.lock" -delete

So it's clear that Riak is getting killed rather than shutting down cleanly; but even so, I'd hope that Riak would cope with that, rather than getting into a permanent state of throwing errors.

Toby


On Fri, 23 Oct 2015 at 00:01 Hector Castro <[hidden email]> wrote:
Can't say I've paid enough attention to the logs in my single-machine
Riak within Docker setups to confirm.

Do you have the container image definitions somewhere public? That may
help someone reproduce the issue. Also, did you ensure that the Riak
data directory is setup as a Docker volume?

Other things that come to mind:

- What OS is the Docker host running?
- What storage driver are you using for Docker?
- What file system is the Docker data directory using?

--
Hector


On Thu, Oct 22, 2015 at 2:27 AM, Toby Corkindale <[hidden email]> wrote:
> Anyone?
>
> I note that after 24 hours (on a very lightly loaded test cluster) I'm still
> seeing these scroll by a lot - 600 an hour per node.
> Really curious to know if this is expected behaviour or if this is resulting
> from some kind of node corruption.
>
> Cheers
> Toby
>
>
>
> On Wed, 21 Oct 2015 at 12:23 Toby Corkindale <[hidden email]> wrote:
>>
>> Hi,
>> I've been working on getting Riak to run inside Docker containers - in a
>> multi-machine cluster. (Previous work I've seen has only run Riak as a
>> cluster all on the same machine.)
>> I thought I had it cracked, although I tripped up on the existing issue
>> with Riak and lockfiles[1]. But the nodes have been generating an awful lot
>> of errors like the below, and I wondered if anyone here can give me an
>> explanation? (And, is it a problem?)
>>
>> 2015-10-21 01:19:23.567 [error] <0.24495.0> Error folding keys for
>> "/var/lib/riak/bitcask.1h/2283596
>> 30832953580969325755111919221821239459840/2.bitcask.data":
>> {incomplete_hint,4}
>>
>> 1: Related issues to the lockfiles --
>> I note that many are closed, but the problem still exists, and is
>> particularly triggered by using Docker and stopping/killing Riak more
>> violently than it likes.
>> https://github.com/basho/bitcask/issues/163 (closed)
>> https://github.com/basho/riak/issues/535 (open)
>> https://github.com/basho/bitcask/issues/167 (closed)
>> https://github.com/basho/bitcask/issues/99 (closed)
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...