Question about the source code: riak_get_fsm

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about the source code: riak_get_fsm

Marc Worrell
Hi,

I was reading the source code of riak_get_fsm to see how failure is handled.
I stumbled on a construction that I don't understand.

In waiting_vnode_r/2 I see that:
1. on receiving an ok: there is a check if there are R ok replies
2. on receiving notfound: there is a check of there are R (ok + notfound) replies

Now suppose I have R = N = 3.
And I get back from the nodes the sequence: [notfound, ok, ok]
Then #state.replied_r = 2, and #state.replied_notfound = 1.
This will let "waiting_vnode_r({r, {ok, RObj}, ...)" stay in the state "waiting_vnode_r".
Though we know we got an answer from all R (N) nodes, only a timeout will move the fsm further.

Could this be handled differently or am I missing something?

- Marc
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Question about the source code: riak_get_fsm

Justin Sheehy
Hi, Marc.

I understand your confusion as that code is a bit subtle.

The reason this isn't a bug is that upon receiving the very first
notfound in your situation, the  "FailThreshold" case in the clause
for notfound messages would return true -- since it would already know
that it could never get 3 ok responses after that.  The FSM would
immediately send a notfound to the client and would not wait for the
subsequent vnode responses.

I hope that this explanation was helpful.

Best,

-Justin



On Tue, Apr 13, 2010 at 9:00 AM, Marc Worrell <[hidden email]> wrote:

> Hi,
>
> I was reading the source code of riak_get_fsm to see how failure is handled.
> I stumbled on a construction that I don't understand.
>
> In waiting_vnode_r/2 I see that:
> 1. on receiving an ok: there is a check if there are R ok replies
> 2. on receiving notfound: there is a check of there are R (ok + notfound) replies
>
> Now suppose I have R = N = 3.
> And I get back from the nodes the sequence: [notfound, ok, ok]
> Then #state.replied_r = 2, and #state.replied_notfound = 1.
> This will let "waiting_vnode_r({r, {ok, RObj}, ...)" stay in the state "waiting_vnode_r".
> Though we know we got an answer from all R (N) nodes, only a timeout will move the fsm further.
>
> Could this be handled differently or am I missing something?
>
> - Marc
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com