uneven disk distribution

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

uneven disk distribution

Johnny Tan
We have a 6-node test riak cluster. One of the nodes seems to be using far more disk:
staging-riak001.pp /dev/sda3              15G  6.3G  7.2G  47% /
staging-riak002.pp /dev/sda3              15G  6.4G  7.1G  48% /
staging-riak003.pp /dev/sda3              15G  6.1G  7.5G  45% /
staging-riak004.pp /dev/sda3              15G   14G  266M  99% /
staging-riak005.pp /dev/sda3              15G  5.8G  7.7G  44% /
staging-riak006.pp /dev/sda3              15G  6.3G  7.3G  47% /

Specifically, /var/lib/riak/bitcask is using up most of that space. It seems to have files in there that are much older than any of the other nodes. We've done maintenance of various sort on this cluster -- as the name indicates, we use it as a staging ground before we go to production. I don't recall a specific issue per se, but I wouldn't rule it out.

Is there a way to figure out if there's an underlying issue here, or whether some of this disk space is not really current and can somehow be purged?

What info would help answer those questions?

johnny

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: uneven disk distribution

Engel Sanchez-2
Hi Johnny. Make sure that the configuration on that node is not different to the others. For example, it could be configured to never merge Bitcask files, so that space could never be reclaimed.


On Thu, May 14, 2015 at 4:31 PM, Johnny Tan <[hidden email]> wrote:
We have a 6-node test riak cluster. One of the nodes seems to be using far more disk:
staging-riak001.pp /dev/sda3              15G  6.3G  7.2G  47% /
staging-riak002.pp /dev/sda3              15G  6.4G  7.1G  48% /
staging-riak003.pp /dev/sda3              15G  6.1G  7.5G  45% /
staging-riak004.pp /dev/sda3              15G   14G  266M  99% /
staging-riak005.pp /dev/sda3              15G  5.8G  7.7G  44% /
staging-riak006.pp /dev/sda3              15G  6.3G  7.3G  47% /

Specifically, /var/lib/riak/bitcask is using up most of that space. It seems to have files in there that are much older than any of the other nodes. We've done maintenance of various sort on this cluster -- as the name indicates, we use it as a staging ground before we go to production. I don't recall a specific issue per se, but I wouldn't rule it out.

Is there a way to figure out if there's an underlying issue here, or whether some of this disk space is not really current and can somehow be purged?

What info would help answer those questions?

johnny

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: uneven disk distribution

Charlie Voiselle
Johnny:

Something else to look for would be any errors in the console.log related to Bitcask merging.  It would be interesting to see if the unusual disk utilization was related to a specific partition.  If it is, you could consider removing that particular partition and running riak_kv:repair to restore the replicas from the adjacent partitions.  I can provide more information if you find that to be the case.

Regards,
Charlie Voiselle
Client Services, Basho


On May 14, 2015 10:06 PM, "Engel Sanchez" <[hidden email]> wrote:
Hi Johnny. Make sure that the configuration on that node is not different to the others. For example, it could be configured to never merge Bitcask files, so that space could never be reclaimed.


On Thu, May 14, 2015 at 4:31 PM, Johnny Tan <[hidden email]> wrote:
We have a 6-node test riak cluster. One of the nodes seems to be using far more disk:
staging-riak001.pp /dev/sda3              15G  6.3G  7.2G  47% /
staging-riak002.pp /dev/sda3              15G  6.4G  7.1G  48% /
staging-riak003.pp /dev/sda3              15G  6.1G  7.5G  45% /
staging-riak004.pp /dev/sda3              15G   14G  266M  99% /
staging-riak005.pp /dev/sda3              15G  5.8G  7.7G  44% /
staging-riak006.pp /dev/sda3              15G  6.3G  7.3G  47% /

Specifically, /var/lib/riak/bitcask is using up most of that space. It seems to have files in there that are much older than any of the other nodes. We've done maintenance of various sort on this cluster -- as the name indicates, we use it as a staging ground before we go to production. I don't recall a specific issue per se, but I wouldn't rule it out.

Is there a way to figure out if there's an underlying issue here, or whether some of this disk space is not really current and can somehow be purged?

What info would help answer those questions?

johnny

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: uneven disk distribution

Johnny Tan
To followup:

Since we use chef (configuration management), the riak configs are the same across all our riak nodes (except for stuff like hostnames/IPs, etc.).

I ran riak_kv:repair and it looks like it had fixed the problem on node 004, but then a _different_ node (002) started to throw a bunch of locking errors. (I thought I saved a copy of that log somewhere but can't seem to find it now, and the old ones are rotated out.)

Nothing I did would stem those errors on 002, even though 004 seemed perfectly fine after the repair. In the end, I rm'd 002's bitcask directory, rejoined the cluster, and it seems to now be back in shape. No errors, the nodes are all relatively similar in size -- 002 lags a little behind the others, but not in a worrisome way.

I'm sure this was related to Bitcask merging, I just still haven't pinpointed what it was. But I appreciate the input and suggestions.

johnny

On Fri, May 15, 2015 at 4:48 PM, Charlie Voiselle <[hidden email]> wrote:
Johnny:

Something else to look for would be any errors in the console.log related to Bitcask merging.  It would be interesting to see if the unusual disk utilization was related to a specific partition.  If it is, you could consider removing that particular partition and running riak_kv:repair to restore the replicas from the adjacent partitions.  I can provide more information if you find that to be the case.

Regards,
Charlie Voiselle
Client Services, Basho


On May 14, 2015 10:06 PM, "Engel Sanchez" <[hidden email]> wrote:
Hi Johnny. Make sure that the configuration on that node is not different to the others. For example, it could be configured to never merge Bitcask files, so that space could never be reclaimed.


On Thu, May 14, 2015 at 4:31 PM, Johnny Tan <[hidden email]> wrote:
We have a 6-node test riak cluster. One of the nodes seems to be using far more disk:
staging-riak001.pp /dev/sda3              15G  6.3G  7.2G  47% /
staging-riak002.pp /dev/sda3              15G  6.4G  7.1G  48% /
staging-riak003.pp /dev/sda3              15G  6.1G  7.5G  45% /
staging-riak004.pp /dev/sda3              15G   14G  266M  99% /
staging-riak005.pp /dev/sda3              15G  5.8G  7.7G  44% /
staging-riak006.pp /dev/sda3              15G  6.3G  7.3G  47% /

Specifically, /var/lib/riak/bitcask is using up most of that space. It seems to have files in there that are much older than any of the other nodes. We've done maintenance of various sort on this cluster -- as the name indicates, we use it as a staging ground before we go to production. I don't recall a specific issue per se, but I wouldn't rule it out.

Is there a way to figure out if there's an underlying issue here, or whether some of this disk space is not really current and can somehow be purged?

What info would help answer those questions?

johnny

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com