Siblings on first write to a key

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Siblings on first write to a key

Daniel Abrahamsson-2
I've run into a case where I got a sbiling error/response on the first
ever write to a key. I would like to understand how this could happen.
Normally when you get siblings, it is because you have written a value
with an out-of-date vclock. But since this is the first write, there
is no vclock. Could someone shed some light on this for me?

It is worth to mention that the it took 3 seconds for Riak to deliver
the response, so it is possible there was some kind of network issue
at the time.

Here are some details about my setup:
Number of nodes: 8.
n_val: 5
write options: pw: 3 (quorum), return_body

Regards,
Daniel Abrahamsson

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Siblings on first write to a key

Magnus Kessler
On 18 April 2017 at 08:20, Daniel Abrahamsson <[hidden email]> wrote:
I've run into a case where I got a sbiling error/response on the first
ever write to a key. I would like to understand how this could happen.
Normally when you get siblings, it is because you have written a value
with an out-of-date vclock. But since this is the first write, there
is no vclock. Could someone shed some light on this for me?

It is worth to mention that the it took 3 seconds for Riak to deliver
the response, so it is possible there was some kind of network issue
at the time.

Here are some details about my setup:
Number of nodes: 8.
n_val: 5
write options: pw: 3 (quorum), return_body

Regards,
Daniel Abrahamsson
 

Hi Daniel,

Please let me know if all nodes in this cluster were set up completely fresh, with empty backend directories, or if any of them had been used before for a Riak installation. If the latter is the case, it may be that the key in question had already been used once before. Cluster nodes pick up data from pre-existing backends.

How do you access the key for read and write operations?

Kind Regards,

Magnus

 
Magnus Kessler
Client Services Engineer
Basho Technologies Limited

Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Siblings on first write to a key

Daniel Abrahamsson-2
Hi Magnus,

This cluster has been running in production for a few months. Key
generation is based on flake (https://github.com/boundary/flake); we
have never experienced a collision in the 3+ years we have been using
it heavily in production. However, I will look into that possibility
as well.

I just noticed that one of the Riak nodes logged this at the time:

2017-04-13 17:42:40.567 [error]
<0.3624.28>@riak_api_pb_server:handle_info:331 Unrecognized message
{30320806,{ok,{r_object,<<"session">>,<<".12011742tWzDvu8mk5WAdfYihfV_T3DcnJ5VDyXC0c">>,[{r_content,{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<"X-Riak-VTag">>,53,114,86,115,108,71,120,112,73,55,108,118,114,100,105,114,107,104,50,66,105,119]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1492,105357,453143}]],[],[]}}},<<...
(actual value removed).

I also have another example (from the same cluster) where there is a
*single* writer to a key, but after a few writes/updates, it also got
a sibling error. Also at that time, the write+read took significantly
longer than normal. I'll check if we had any "unrecognized messages"
in the Riak logs at that time as well.

To answer your second question, we are talking to the riak cluster
over protocol buffers, using the official Erlang client.

//Daniel

On Tue, Apr 18, 2017 at 1:51 PM, Magnus Kessler <[hidden email]> wrote:

> On 18 April 2017 at 08:20, Daniel Abrahamsson <[hidden email]> wrote:
>>
>> I've run into a case where I got a sbiling error/response on the first
>> ever write to a key. I would like to understand how this could happen.
>> Normally when you get siblings, it is because you have written a value
>> with an out-of-date vclock. But since this is the first write, there
>> is no vclock. Could someone shed some light on this for me?
>>
>> It is worth to mention that the it took 3 seconds for Riak to deliver
>> the response, so it is possible there was some kind of network issue
>> at the time.
>>
>> Here are some details about my setup:
>> Number of nodes: 8.
>> n_val: 5
>> write options: pw: 3 (quorum), return_body
>>
>> Regards,
>> Daniel Abrahamsson
>>
>
>
> Hi Daniel,
>
> Please let me know if all nodes in this cluster were set up completely
> fresh, with empty backend directories, or if any of them had been used
> before for a Riak installation. If the latter is the case, it may be that
> the key in question had already been used once before. Cluster nodes pick up
> data from pre-existing backends.
>
> How do you access the key for read and write operations?
>
> Kind Regards,
>
> Magnus
>
>
> Magnus Kessler
> Client Services Engineer
> Basho Technologies Limited
>
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Siblings on first write to a key

Doug Rohrer
This sounds like an issue our Riak CS team ran into quite a while ago, which involved “slow nodes” and coordination retry. Take a look at https://github.com/basho/riak_kv/issues/1188 and see if it makes sense to you, but it certainly sounds like what’s happening.

The basic flow of the issue comes when one node in the preflist is down, and you write to a node _not in the preflist_, at which point the following happens (better formatted in the issue above, btw):

client        node-A              node-R         node-S
   ---(Put)-->
             Compute PL
               = P, Q and R
             Redirect to R --->  [frozen]
             |
             | 3 sec timeout
             V
             Compute new PL excluding R
               = P, Q and S
             Redirect to S --------------------> Compute PL without
             |                                     any knowlege about R (at this point)
             |                                     = P, Q and R
             |                                   Redirect to R  ---+
             |                                   |                 |
             |                 [what happnes?] <-|-----------------+
             |                                   | 3 sec timeout
             |                                   V
             |                                   Compute new PL excluding R
             |                                     = P, Q and S
             |                                   I'm coordinator this time
             |                                   Execute put
             V 3 sec timeout
             Compute new PL again
               [continues]

So, it’s possible for a slow/down node (node R in this case) to eventually cause two _other nodes_ to each write a sibling, even on a new key. In fact, depending on the number of nodes in the system and your luck, you could end up writing more than one sibling on a fresh write in this case. Given your comment about a network issue potentially being a factor, and the 3-second timing you noted (the default for the failure timeout), this increases the likelihood that this was, in fact, the issue.

A fix for this issue has been worked on and tested, but is not yet incorporated into a version of Riak for distribution. You can, however, disable the coordinator retry logic as noted in the issue I referenced above, or increase the timeout if your cluster is running slowly in general by setting `riak_kv`, `put_coordinator_failure_timeout` in your `advanced.config` file (see http://docs.basho.com/riak/kv/2.2.3/configuring/reference/#advanced-configuration for the general format of the advanced.config if you’re not familiar).

Hope this helps.

Doug Rohrer


On 4/18/17, 8:28 AM, "riak-users on behalf of Daniel Abrahamsson" <[hidden email] on behalf of [hidden email]> wrote:

    Hi Magnus,
   
    This cluster has been running in production for a few months. Key
    generation is based on flake (https://github.com/boundary/flake); we
    have never experienced a collision in the 3+ years we have been using
    it heavily in production. However, I will look into that possibility
    as well.
   
    I just noticed that one of the Riak nodes logged this at the time:
   
    2017-04-13 17:42:40.567 [error]
    <0.3624.28>@riak_api_pb_server:handle_info:331 Unrecognized message
    {30320806,{ok,{r_object,<<"session">>,<<".12011742tWzDvu8mk5WAdfYihfV_T3DcnJ5VDyXC0c">>,[{r_content,{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<"X-Riak-VTag">>,53,114,86,115,108,71,120,112,73,55,108,118,114,100,105,114,107,104,50,66,105,119]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1492,105357,453143}]],[],[]}}},<<...
    (actual value removed).
   
    I also have another example (from the same cluster) where there is a
    *single* writer to a key, but after a few writes/updates, it also got
    a sibling error. Also at that time, the write+read took significantly
    longer than normal. I'll check if we had any "unrecognized messages"
    in the Riak logs at that time as well.
   
    To answer your second question, we are talking to the riak cluster
    over protocol buffers, using the official Erlang client.
   
    //Daniel
   
    On Tue, Apr 18, 2017 at 1:51 PM, Magnus Kessler <[hidden email]> wrote:
    > On 18 April 2017 at 08:20, Daniel Abrahamsson <[hidden email]> wrote:
    >>
    >> I've run into a case where I got a sbiling error/response on the first
    >> ever write to a key. I would like to understand how this could happen.
    >> Normally when you get siblings, it is because you have written a value
    >> with an out-of-date vclock. But since this is the first write, there
    >> is no vclock. Could someone shed some light on this for me?
    >>
    >> It is worth to mention that the it took 3 seconds for Riak to deliver
    >> the response, so it is possible there was some kind of network issue
    >> at the time.
    >>
    >> Here are some details about my setup:
    >> Number of nodes: 8.
    >> n_val: 5
    >> write options: pw: 3 (quorum), return_body
    >>
    >> Regards,
    >> Daniel Abrahamsson
    >>
    >
    >
    > Hi Daniel,
    >
    > Please let me know if all nodes in this cluster were set up completely
    > fresh, with empty backend directories, or if any of them had been used
    > before for a Riak installation. If the latter is the case, it may be that
    > the key in question had already been used once before. Cluster nodes pick up
    > data from pre-existing backends.
    >
    > How do you access the key for read and write operations?
    >
    > Kind Regards,
    >
    > Magnus
    >
    >
    > Magnus Kessler
    > Client Services Engineer
    > Basho Technologies Limited
    >
    > Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
   
    _______________________________________________
    riak-users mailing list
    [hidden email]
    http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
   



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Siblings on first write to a key

Daniel Abrahamsson-2
Hi Douglas,

That seems to be a good candidate for an explanation. Thank you very
much for the explanation and link. I'll dig into it.

As promised, I looked into whether we in the second case I mentioned
also had "unrecognized message" in the logs, and we indeed had.




On Tue, Apr 18, 2017 at 2:55 PM, Douglas Rohrer <[hidden email]> wrote:

> This sounds like an issue our Riak CS team ran into quite a while ago, which involved “slow nodes” and coordination retry. Take a look at https://github.com/basho/riak_kv/issues/1188 and see if it makes sense to you, but it certainly sounds like what’s happening.
>
> The basic flow of the issue comes when one node in the preflist is down, and you write to a node _not in the preflist_, at which point the following happens (better formatted in the issue above, btw):
>
> client        node-A              node-R         node-S
>    ---(Put)-->
>              Compute PL
>                = P, Q and R
>              Redirect to R --->  [frozen]
>              |
>              | 3 sec timeout
>              V
>              Compute new PL excluding R
>                = P, Q and S
>              Redirect to S --------------------> Compute PL without
>              |                                     any knowlege about R (at this point)
>              |                                     = P, Q and R
>              |                                   Redirect to R  ---+
>              |                                   |                 |
>              |                 [what happnes?] <-|-----------------+
>              |                                   | 3 sec timeout
>              |                                   V
>              |                                   Compute new PL excluding R
>              |                                     = P, Q and S
>              |                                   I'm coordinator this time
>              |                                   Execute put
>              V 3 sec timeout
>              Compute new PL again
>                [continues]
>
> So, it’s possible for a slow/down node (node R in this case) to eventually cause two _other nodes_ to each write a sibling, even on a new key. In fact, depending on the number of nodes in the system and your luck, you could end up writing more than one sibling on a fresh write in this case. Given your comment about a network issue potentially being a factor, and the 3-second timing you noted (the default for the failure timeout), this increases the likelihood that this was, in fact, the issue.
>
> A fix for this issue has been worked on and tested, but is not yet incorporated into a version of Riak for distribution. You can, however, disable the coordinator retry logic as noted in the issue I referenced above, or increase the timeout if your cluster is running slowly in general by setting `riak_kv`, `put_coordinator_failure_timeout` in your `advanced.config` file (see http://docs.basho.com/riak/kv/2.2.3/configuring/reference/#advanced-configuration for the general format of the advanced.config if you’re not familiar).
>
> Hope this helps.
>
> Doug Rohrer
>
>
> On 4/18/17, 8:28 AM, "riak-users on behalf of Daniel Abrahamsson" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Hi Magnus,
>
>     This cluster has been running in production for a few months. Key
>     generation is based on flake (https://github.com/boundary/flake); we
>     have never experienced a collision in the 3+ years we have been using
>     it heavily in production. However, I will look into that possibility
>     as well.
>
>     I just noticed that one of the Riak nodes logged this at the time:
>
>     2017-04-13 17:42:40.567 [error]
>     <0.3624.28>@riak_api_pb_server:handle_info:331 Unrecognized message
>     {30320806,{ok,{r_object,<<"session">>,<<".12011742tWzDvu8mk5WAdfYihfV_T3DcnJ5VDyXC0c">>,[{r_content,{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<"X-Riak-VTag">>,53,114,86,115,108,71,120,112,73,55,108,118,114,100,105,114,107,104,50,66,105,119]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1492,105357,453143}]],[],[]}}},<<...
>     (actual value removed).
>
>     I also have another example (from the same cluster) where there is a
>     *single* writer to a key, but after a few writes/updates, it also got
>     a sibling error. Also at that time, the write+read took significantly
>     longer than normal. I'll check if we had any "unrecognized messages"
>     in the Riak logs at that time as well.
>
>     To answer your second question, we are talking to the riak cluster
>     over protocol buffers, using the official Erlang client.
>
>     //Daniel
>
>     On Tue, Apr 18, 2017 at 1:51 PM, Magnus Kessler <[hidden email]> wrote:
>     > On 18 April 2017 at 08:20, Daniel Abrahamsson <[hidden email]> wrote:
>     >>
>     >> I've run into a case where I got a sbiling error/response on the first
>     >> ever write to a key. I would like to understand how this could happen.
>     >> Normally when you get siblings, it is because you have written a value
>     >> with an out-of-date vclock. But since this is the first write, there
>     >> is no vclock. Could someone shed some light on this for me?
>     >>
>     >> It is worth to mention that the it took 3 seconds for Riak to deliver
>     >> the response, so it is possible there was some kind of network issue
>     >> at the time.
>     >>
>     >> Here are some details about my setup:
>     >> Number of nodes: 8.
>     >> n_val: 5
>     >> write options: pw: 3 (quorum), return_body
>     >>
>     >> Regards,
>     >> Daniel Abrahamsson
>     >>
>     >
>     >
>     > Hi Daniel,
>     >
>     > Please let me know if all nodes in this cluster were set up completely
>     > fresh, with empty backend directories, or if any of them had been used
>     > before for a Riak installation. If the latter is the case, it may be that
>     > the key in question had already been used once before. Cluster nodes pick up
>     > data from pre-existing backends.
>     >
>     > How do you access the key for read and write operations?
>     >
>     > Kind Regards,
>     >
>     > Magnus
>     >
>     >
>     > Magnus Kessler
>     > Client Services Engineer
>     > Basho Technologies Limited
>     >
>     > Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
>
>     _______________________________________________
>     riak-users mailing list
>     [hidden email]
>     http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com