Deleted keys come back

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Deleted keys come back

mtakahashi-ivi
Hello,

I'm using riak KV in 2 nodes cluster.
I inserted hundreds of key/value pair and deleted all keys in a bucket.
After above process, I can get some keys if I get list of keys in the bucket.
Why those keys remain? How do I delete keys reliably?
If I increase number of nodes to 5 , I can delete all keys in the bucket as same way as I did.
My bucket property is the following.

----------------------
{
  "props": {
    "name": "BUCKET_A",
    "active": true,
    "allow_mult": false,
    "basic_quorum": false,
    "big_vclock": 50,
    "chash_keyfun": {
      "mod": "riak_core_util",
      "fun": "chash_std_keyfun"
    },
    "claimant": "riak@172.17.0.171",
    "dvv_enabled": true,
    "dw": "quorum",
    "last_write_wins": false,
    "linkfun": {
      "mod": "riak_kv_wm_link_walker",
      "fun": "mapreduce_linkfun"
    },
    "n_val": 2,
    "notfound_ok": false,
    "old_vclock": 86400,
    "postcommit": [],
    "pr": 0,
    "precommit": [],
    "pw": 0,
    "r": 1,
    "rw": "quorum",
    "search_index": "BUCKET_A_INDEX",
    "small_vclock": 50,
    "w": 1,
    "young_vclock": 20
  }
}


Masanori Takahashi
Reply | Threaded
Open this post in threaded view
|

Re: Deleted keys come back

Dmitri Zagidulin
Hello,

There are two things going on here: the W quorum value of the write and delete operations, and possibly the delete_mode setting.

Let's walk through the scenario.
You're writing to a 2 node cluster, two copies of each object (n_val=2), with your write quorum of 1 (W=1).

So that's possibility #1 -- there's no guarantee that your writes succeed with both replicas. (It could've just written one, and the other one is missing).

Then you're doing a List Keys (to delete the objects), which runs with an implicit quorum of R=1. (Meaning, it only contacts half of the replicas, and lists them.) So (if possibility #1 happened, above) the list keys could have not returned some keys, the first time around. (Because it may have contacted the partitions that had missing replicas).  Then you deleted, ran another List Keys, and that one could have returned the keys that it missed the first time.

Possibility #2 -- your deletes are using W=1, meaning, they're only waiting for the delete operation from 1 replica to respond, before returning success. So, it's possible that a delete operation removed just one replica, but the second one still exists. And the second List Keys can now pick up the not-deleted replica.

Possibility #3 -- by default, the delete_mode is set to keep deleted objects for 3 seconds.  So, if you ran your deletes, and then re-ran a List Keys before the 3 seconds expired, you could pick up some keys.

The upshot of all this is:
- Use W=2 when writing and deleting. (That is, your W value should be the same as your N value).

- If that doesn't work, set delete_mode to 'immediate' in the config.
Specifically, delete_mode is set in the advanced.config file (http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/#Advanced-Configuration ). So, my advanced.config file looks like this:

[
  {riak_kv,
    [
      {delete_mode, immediate}
    ]}
  }
].

Also, if you're deleting things for unit tests, there's an easier way. Instead of deleting the bucket object-by-object, you can just stop the node, and clear the bitcask (or leveldb) data directory. (That's going to get rid of all the data in the cluster, which is what you want to do for unit tests anyways.)

You can learn more about these topics on the following pages:
* http://docs.basho.com/riak/latest/ops/advanced/deletion/
* http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html  (mailing list post introducing delete_mode)
* http://basho.com/posts/technical/riaks-config-behaviors-part-3/  (the Tombstones section)

Dmitri




On Wed, Oct 7, 2015 at 1:35 PM, mtakahashi-ivi <[hidden email]> wrote:
Hello,

I'm using riak KV in 2 nodes cluster.
I inserted hundreds of key/value pair and deleted all keys in a bucket.
After above process, I can get some keys if I get list of keys in the
bucket.
Why those keys remain? How do I delete keys reliably?
If I increase number of nodes to 5 , I can delete all keys in the bucket as
same way as I did.
My bucket property is the following.

----------------------
{
  "props": {
    "name": "BUCKET_A",
    "active": true,
    "allow_mult": false,
    "basic_quorum": false,
    "big_vclock": 50,
    "chash_keyfun": {
      "mod": "riak_core_util",
      "fun": "chash_std_keyfun"
    },
    "claimant": "[hidden email]",
    "dvv_enabled": true,
    "dw": "quorum",
    "last_write_wins": false,
    "linkfun": {
      "mod": "riak_kv_wm_link_walker",
      "fun": "mapreduce_linkfun"
    },
    "n_val": 2,
    "notfound_ok": false,
    "old_vclock": 86400,
    "postcommit": [],
    "pr": 0,
    "precommit": [],
    "pw": 0,
    "r": 1,
    "rw": "quorum",
    "search_index": "BUCKET_A_INDEX",
    "small_vclock": 50,
    "w": 1,
    "young_vclock": 20
  }
}


Masanori Takahashi



--
View this message in context: http://riak-users.197444.n3.nabble.com/Deleted-keys-come-back-tp4033536.html
Sent from the Riak Users mailing list archive at Nabble.com.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Deleted keys come back

Kota Uenishi
I'd also recommend to have "allow_mult" as true to prevent
resurrection caused by sloppy quorum, handoffs or any network
separation. Also, to make sure your data eventually consistent, R+W
should be > N. Your case is 1+1, not >2.

On Wed, Oct 7, 2015 at 10:19 PM, Dmitri Zagidulin <[hidden email]> wrote:

> Hello,
>
> There are two things going on here: the W quorum value of the write and
> delete operations, and possibly the delete_mode setting.
>
> Let's walk through the scenario.
> You're writing to a 2 node cluster, two copies of each object (n_val=2),
> with your write quorum of 1 (W=1).
>
> So that's possibility #1 -- there's no guarantee that your writes succeed
> with both replicas. (It could've just written one, and the other one is
> missing).
>
> Then you're doing a List Keys (to delete the objects), which runs with an
> implicit quorum of R=1. (Meaning, it only contacts half of the replicas, and
> lists them.) So (if possibility #1 happened, above) the list keys could have
> not returned some keys, the first time around. (Because it may have
> contacted the partitions that had missing replicas).  Then you deleted, ran
> another List Keys, and that one could have returned the keys that it missed
> the first time.
>
> Possibility #2 -- your deletes are using W=1, meaning, they're only waiting
> for the delete operation from 1 replica to respond, before returning
> success. So, it's possible that a delete operation removed just one replica,
> but the second one still exists. And the second List Keys can now pick up
> the not-deleted replica.
>
> Possibility #3 -- by default, the delete_mode is set to keep deleted objects
> for 3 seconds.  So, if you ran your deletes, and then re-ran a List Keys
> before the 3 seconds expired, you could pick up some keys.
>
> The upshot of all this is:
> - Use W=2 when writing and deleting. (That is, your W value should be the
> same as your N value).
>
> - If that doesn't work, set delete_mode to 'immediate' in the config.
> Specifically, delete_mode is set in the advanced.config file
> (http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/#Advanced-Configuration
> ). So, my advanced.config file looks like this:
>
> [
>   {riak_kv,
>     [
>       {delete_mode, immediate}
>     ]}
>   }
> ].
>
> Also, if you're deleting things for unit tests, there's an easier way.
> Instead of deleting the bucket object-by-object, you can just stop the node,
> and clear the bitcask (or leveldb) data directory. (That's going to get rid
> of all the data in the cluster, which is what you want to do for unit tests
> anyways.)
>
> You can learn more about these topics on the following pages:
> * http://docs.basho.com/riak/latest/ops/advanced/deletion/
> *
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-October/006048.html
> (mailing list post introducing delete_mode)
> * http://basho.com/posts/technical/riaks-config-behaviors-part-3/  (the
> Tombstones section)
>
> Dmitri
>
>
>
>
> On Wed, Oct 7, 2015 at 1:35 PM, mtakahashi-ivi <[hidden email]>
> wrote:
>>
>> Hello,
>>
>> I'm using riak KV in 2 nodes cluster.
>> I inserted hundreds of key/value pair and deleted all keys in a bucket.
>> After above process, I can get some keys if I get list of keys in the
>> bucket.
>> Why those keys remain? How do I delete keys reliably?
>> If I increase number of nodes to 5 , I can delete all keys in the bucket
>> as
>> same way as I did.
>> My bucket property is the following.
>>
>> ----------------------
>> {
>>   "props": {
>>     "name": "BUCKET_A",
>>     "active": true,
>>     "allow_mult": false,
>>     "basic_quorum": false,
>>     "big_vclock": 50,
>>     "chash_keyfun": {
>>       "mod": "riak_core_util",
>>       "fun": "chash_std_keyfun"
>>     },
>>     "claimant": "riak@172.17.0.171",
>>     "dvv_enabled": true,
>>     "dw": "quorum",
>>     "last_write_wins": false,
>>     "linkfun": {
>>       "mod": "riak_kv_wm_link_walker",
>>       "fun": "mapreduce_linkfun"
>>     },
>>     "n_val": 2,
>>     "notfound_ok": false,
>>     "old_vclock": 86400,
>>     "postcommit": [],
>>     "pr": 0,
>>     "precommit": [],
>>     "pw": 0,
>>     "r": 1,
>>     "rw": "quorum",
>>     "search_index": "BUCKET_A_INDEX",
>>     "small_vclock": 50,
>>     "w": 1,
>>     "young_vclock": 20
>>   }
>> }
>>
>>
>> Masanori Takahashi
>>
>>
>>
>> --
>> View this message in context:
>> http://riak-users.197444.n3.nabble.com/Deleted-keys-come-back-tp4033536.html
>> Sent from the Riak Users mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



--
Kota UENISHI / @kuenishi
Basho Japan KK

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Deleted keys come back

Alexander Sicular
In reply to this post by Dmitri Zagidulin
Seconded. This makes your cluster so fresh, so clean for new tests.

For node in nodes
  Stop node
  Do stuff (ie. delete data directory)
  Start node


That general pattern is known as rolling restarts and is more or less how Basho recommends doing maintenance on a Riak cluster.

Regards,
-Alexander

@siculars
http://siculars.posthaven.com

Sent from my iRotaryPhone

> On Oct 7, 2015, at 09:19, Dmitri Zagidulin <[hidden email]> wrote:
>
> Also, if you're deleting things for unit tests, there's an easier way. Instead of deleting the bucket object-by-object, you can just stop the node, and clear the bitcask (or leveldb) data directory. (That's going to get rid of all the data in the cluster, which is what you want to do for unit tests anyways.)

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Deleted keys come back

mtakahashi-ivi
In reply to this post by Kota Uenishi
Hello,

Thank you all and sorry for replying so late.
W=2 works.

"allow_mult" is not suitable for me. Because sibling affects results of Yokozuna search.

> Also, to make sure your data eventually consistent, R+W
> should be > N. Your case is 1+1, not >2.

Right. I'm not indended but, I can't change it now.

Thanks,
Masanori Takahashi
Reply | Threaded
Open this post in threaded view
|

Re: Deleted keys come back

Dmitri Zagidulin
> "allow_mult" is not suitable for me. Because sibling affects results of Yokozuna search.

Can you tell us more about that? How do siblings affect the results of search in your case?

On Sat, Oct 10, 2015 at 1:22 AM, mtakahashi-ivi <[hidden email]> wrote:
Hello,

Thank you all and sorry for replying so late.
W=2 works.

"allow_mult" is not suitable for me. Because sibling affects results of
Yokozuna search.

> Also, to make sure your data eventually consistent, R+W
> should be > N. Your case is 1+1, not >2.

Right. I'm not indended but, I can't change it now.

Thanks,
Masanori Takahashi



--
View this message in context: http://riak-users.197444.n3.nabble.com/Deleted-keys-come-back-tp4033536p4033575.html
Sent from the Riak Users mailing list archive at Nabble.com.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Deleted keys come back

mtakahashi-ivi
>Can you tell us more about that? How do siblings affect the results of search in your case?
When I checked last year, search result and stats result includes sibling object.
Search result and stats count includes sibling object. It's complicated to handle with the result.

$ curl -sS http://<host>:<port>/types/ssdms_test/buckets/unit_test_bucket1/keys/bkey1_1
Siblings:
3HIxQ2pcu0YMjekbtIuixb
3JonReTfLC3GvG534WS21i
63t3g19KaCLu7JFLMxHxJJ

$ curl -sS 'http://<host>:<port>/search/query/unit_test_bucket1_index?wt=json&q=key1_s%3aval1_1&rows=50&stats=true&stats.field=key3_i' | jq .

{
  "responseHeader": {
    "status": 0,
    "QTime": 7,
    "params": {
      "q": "key1_s:val1_1",
      "shards": "192.168.1.235:8093/internal_solr/unit_test_bucket1_index",
      "stats": "true",
      "192.168.1.235:8093": "_yz_pn:63 OR _yz_pn:61 OR _yz_pn:59 OR _yz_pn:57 OR _yz_pn:55 OR _yz_pn:53 OR _yz_pn:51 OR _yz_pn:49 OR _yz_pn:47 OR _yz_pn:45 OR _yz_pn:43 OR _yz_pn:41 OR _yz_pn:39 OR _yz_pn:37 OR _yz_pn:35 OR _yz_pn:33 OR _yz_pn:31 OR _yz_pn:29 OR _yz_pn:27 OR _yz_pn:25 OR _yz_pn:23 OR _yz_pn:21 OR _yz_pn:19 OR _yz_pn:17 OR _yz_pn:15 OR _yz_pn:13 OR _yz_pn:11 OR _yz_pn:9 OR _yz_pn:7 OR _yz_pn:5 OR _yz_pn:3 OR _yz_pn:1",
      "rows": "50",
      "wt": "json",
      "stats.field": "key3_i"
    }
  },
  "response": {
    "numFound": 3,
    "start": 0,
    "maxScore": 4.149883,
    "docs": [
      {
        "key3_i": 1,
        "key2_s": "val_2",
        "key1_s": "val1_1",
        "_yz_id": "1*ssdms_test*unit_test_bucket1*bkey1_1*61*3HIxQ2pcu0YMjekbtIuixb",
        "_yz_rk": "bkey1_1",
        "_yz_rt": "ssdms_test",
        "_yz_rb": "unit_test_bucket1"
      },
      {
        "key3_i": 1,
        "key2_s": "val_2",
        "key1_s": "val1_1",
        "_yz_id": "1*ssdms_test*unit_test_bucket1*bkey1_1*61*3JonReTfLC3GvG534WS21i",
        "_yz_rk": "bkey1_1",
        "_yz_rt": "ssdms_test",
        "_yz_rb": "unit_test_bucket1"
      },
      {
        "key3_i": 1,
        "key2_s": "val_2",
        "key1_s": "val1_1",
        "_yz_id": "1*ssdms_test*unit_test_bucket1*bkey1_1*61*63t3g19KaCLu7JFLMxHxJJ",
        "_yz_rk": "bkey1_1",
        "_yz_rt": "ssdms_test",
        "_yz_rb": "unit_test_bucket1"
      }
    ]
  },
  "stats": {
    "stats_fields": {
      "key3_i": {
        "min": 1,
        "max": 1,
        "count": 3,
        "missing": 0,
        "sum": 3,
        "sumOfSquares": 3,
        "mean": 1,
        "stddev": 0,
        "facets": {}
      }
    }
  }
}