I’ve been running a 5 machine cluster (I believe default settings for everything) for a while, testing various things (pushing/pulling) while integrating bindings (c++ via aws sdk, python via boto, or s3cmd on linux) and as a test, I was
going to clean up all the keys and buckets I had created (approx. 320 GB over 150-400k keys – I don’t know the exact count, forgot to save that)
I started to use boto to iterate over the bucket and delete each key, but after a time (30+ minutes?) I stopped it to try to see why it was taking so long. I believe having done a bunch of deleting, riak is in a state where reconciling
those deletes across the cluster (?) and it appears to be taking quite some time. (side note - is there a better way to nuke a bucket?)
Idle, riak seems to be spiking in cpu usage and checking the contents of buckets in boto is timing out, and s3cmd is taking minutes to “s3cmd ls s3://bucket” to report the bucket I was deleting from is now empty.
Checking /var/log/riak/ - I see some errors and crash report, along with a lot of “anti-entropy exchange” messages, which seems to possibly correlate to my theory that it’s still processing and cleaning up state?
It’s now been an extra day, riak seems to be idle now, and I still have really poor performance (25+ seconds) querying the now empty bucket. Does this sound odd? I suspect that if I delete, and recreate the bucket it’ll be fast again
but I am concerned as I wasn’t intending to use this as write only. Doing a small test with gsutil adding and deleting 25k 10k files – it seems the performance to list the empty bucket drops from instant to several seconds, so this seems easily reproducible
root@cachetest001:/home/build# riak version
root@cachetest001:/home/build# riak-cs version
root@cachetest001:/home/build# stanchion version
root@cachetest001:/home/build# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Machines are decent spec, Intel Xeon E5430 16gb ram.
I’m sure there are options I have passed over that are very relevant – appreciate any thoughts or suggestions on how to investigate this further.