safely resolving conflicts on read

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

safely resolving conflicts on read

Justin Karneges
Hi,

http://wiki.basho.com/Vector-Clocks.html contains this text:

"It should be noted that if you are trying to resolve conflicts automatically,
you can end up in a condition with which two clients are simultaneously
resolving and creating new conflicts."

If conflict resolution is moved to the reader, I'm curious what strategies
people use to avoid this kind of feet stomping.

Some ideas that have come to mind:

1) Have a deterministic way of deriving a correct/unified/winner value from
multiple siblings, such that any reading client would always arrive at the
same answer.  Simply use this derived answer as the value read, but don't
attempt to write a corrected value into Riak.  The value could eventually be
corrected at the time of a necessary write as opposed to a read reaction.

2) Determine a correct value per #1 above, but then attempt to write this
value back into Riak in such a way that if multiple nodes were to write the
same value simultaneously then they don't create siblings.  I'm not sure if
this is possible in Riak?

3) Determine a correct value per #1 above, and allow exactly one node to ever
immediately write corrected values after a read.  Something like "if hash(key)
% node_count == current_node then do_correction".  Since value correction is
not vital to availability, there's no harm if the owning node is down.  I just
figure this would allow for self-healing over time.

Maybe there are other ways.  What do people really do?

Thanks,
Justin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

SV: safely resolving conflicts on read

Erik Søe Sørensen
What you'd usually do is somewhere between 2) and 3) - namely, accept that siblings might occur (although rarely). Also, you'd have a resolution function with the property (besides being deterministic) that reconciliating two identical siblings would yield the same - i.e., f(X,X) = X.
________________________________________
Fra: [hidden email] [[hidden email]] På vegne af Justin Karneges [[hidden email]]
Sendt: 1. november 2011 21:34
Til: [hidden email]
Emne: safely resolving conflicts on read

Hi,

http://wiki.basho.com/Vector-Clocks.html contains this text:

"It should be noted that if you are trying to resolve conflicts automatically,
you can end up in a condition with which two clients are simultaneously
resolving and creating new conflicts."

If conflict resolution is moved to the reader, I'm curious what strategies
people use to avoid this kind of feet stomping.

Some ideas that have come to mind:

1) Have a deterministic way of deriving a correct/unified/winner value from
multiple siblings, such that any reading client would always arrive at the
same answer.  Simply use this derived answer as the value read, but don't
attempt to write a corrected value into Riak.  The value could eventually be
corrected at the time of a necessary write as opposed to a read reaction.

2) Determine a correct value per #1 above, but then attempt to write this
value back into Riak in such a way that if multiple nodes were to write the
same value simultaneously then they don't create siblings.  I'm not sure if
this is possible in Riak?

3) Determine a correct value per #1 above, and allow exactly one node to ever
immediately write corrected values after a read.  Something like "if hash(key)
% node_count == current_node then do_correction".  Since value correction is
not vital to availability, there's no harm if the owning node is down.  I just
figure this would allow for self-healing over time.

Maybe there are other ways.  What do people really do?

Thanks,
Justin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: safely resolving conflicts on read

Bob Ippolito
An approach similar to #1 is implemented in statebox http://github.com/mochi/statebox - basically the trick is to store an operation queue along with the data, and to put some constraints on how operations must work so that they can be repeated for conflict resolution.

On Wednesday, November 2, 2011, Erik Søe Sørensen <[hidden email]> wrote:
> What you'd usually do is somewhere between 2) and 3) - namely, accept that siblings might occur (although rarely). Also, you'd have a resolution function with the property (besides being deterministic) that reconciliating two identical siblings would yield the same - i.e., f(X,X) = X.
> ________________________________________
> Fra: [hidden email] [[hidden email]] P&#229; vegne af Justin Karneges [[hidden email]]
> Sendt: 1. november 2011 21:34
> Til: [hidden email]
> Emne: safely resolving conflicts on read
>
> Hi,
>
> http://wiki.basho.com/Vector-Clocks.html contains this text:
>
> "It should be noted that if you are trying to resolve conflicts automatically,
> you can end up in a condition with which two clients are simultaneously
> resolving and creating new conflicts."
>
> If conflict resolution is moved to the reader, I'm curious what strategies
> people use to avoid this kind of feet stomping.
>
> Some ideas that have come to mind:
>
> 1) Have a deterministic way of deriving a correct/unified/winner value from
> multiple siblings, such that any reading client would always arrive at the
> same answer.  Simply use this derived answer as the value read, but don't
> attempt to write a corrected value into Riak.  The value could eventually be
> corrected at the time of a necessary write as opposed to a read reaction.
>
> 2) Determine a correct value per #1 above, but then attempt to write this
> value back into Riak in such a way that if multiple nodes were to write the
> same value simultaneously then they don't create siblings.  I'm not sure if
> this is possible in Riak?
>
> 3) Determine a correct value per #1 above, and allow exactly one node to ever
> immediately write corrected values after a read.  Something like "if hash(key)
> % node_count == current_node then do_correction".  Since value correction is
> not vital to availability, there's no harm if the owning node is down.  I just
> figure this would allow for self-healing over time.
>
> Maybe there are other ways.  What do people really do?
>
> Thanks,
> Justin
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: safely resolving conflicts on read

Justin Karneges
Thanks everyone for these replies (and also Aphyr, off-list).  It has helped me
confirm my suspicions and sounds like I'm on the right track.

For one of my keys, I am doing sort of a manual "last write wins" by having
the reader sort siblings by timestamp, then by vtag, to deterministically
select the same sibling every time.  The reason for keeping the other siblings
around is they may contain the only references to other keys created along
with them.  A separate cleanup process can then be sure to delete the referred
keys before removing the siblings.  And of course the algorithm used to
determine the winning sibling is shared by both the read function and the
cleanup function.

On Wednesday, November 02, 2011 07:52:03 AM Bob Ippolito wrote:

> An approach similar to #1 is implemented in statebox
> http://github.com/mochi/statebox - basically the trick is to store an
> operation queue along with the data, and to put some constraints on how
> operations must work so that they can be repeated for conflict resolution.
>
> On Wednesday, November 2, 2011, Erik Søe Sørensen <[hidden email]> wrote:
> > What you'd usually do is somewhere between 2) and 3) - namely, accept
>
> that siblings might occur (although rarely). Also, you'd have a resolution
> function with the property (besides being deterministic) that
> reconciliating two identical siblings would yield the same - i.e., f(X,X) =
> X.
>
> > ________________________________________
> > Fra: [hidden email] [
>
> [hidden email]] P&#229; vegne af Justin Karneges [
> [hidden email]]
>
> > Sendt: 1. november 2011 21:34
> > Til: [hidden email]
> > Emne: safely resolving conflicts on read
> >
> > Hi,
> >
> > http://wiki.basho.com/Vector-Clocks.html contains this text:
> >
> > "It should be noted that if you are trying to resolve conflicts
>
> automatically,
>
> > you can end up in a condition with which two clients are simultaneously
> > resolving and creating new conflicts."
> >
> > If conflict resolution is moved to the reader, I'm curious what
> > strategies people use to avoid this kind of feet stomping.
> >
> > Some ideas that have come to mind:
> >
> > 1) Have a deterministic way of deriving a correct/unified/winner value
>
> from
>
> > multiple siblings, such that any reading client would always arrive at
> > the same answer.  Simply use this derived answer as the value read, but
> > don't attempt to write a corrected value into Riak.  The value could
> > eventually
>
> be
>
> > corrected at the time of a necessary write as opposed to a read reaction.
> >
> > 2) Determine a correct value per #1 above, but then attempt to write this
> > value back into Riak in such a way that if multiple nodes were to write
>
> the
>
> > same value simultaneously then they don't create siblings.  I'm not sure
>
> if
>
> > this is possible in Riak?
> >
> > 3) Determine a correct value per #1 above, and allow exactly one node to
>
> ever
>
> > immediately write corrected values after a read.  Something like "if
>
> hash(key)
>
> > % node_count == current_node then do_correction".  Since value correction
>
> is
>
> > not vital to availability, there's no harm if the owning node is down.  I
>
> just
>
> > figure this would allow for self-healing over time.
> >
> > Maybe there are other ways.  What do people really do?
> >
> > Thanks,
> > Justin
> >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: safely resolving conflicts on read

Aphyr
On 11/02/2011 10:40 AM, Justin Karneges wrote:

> Thanks everyone for these replies (and also Aphyr, off-list).  It has helped me
> confirm my suspicions and sounds like I'm on the right track.
>
> For one of my keys, I am doing sort of a manual "last write wins" by having
> the reader sort siblings by timestamp, then by vtag, to deterministically
> select the same sibling every time.  The reason for keeping the other siblings
> around is they may contain the only references to other keys created along
> with them.  A separate cleanup process can then be sure to delete the referred
> keys before removing the siblings.  And of course the algorithm used to
> determine the winning sibling is shared by both the read function and the
> cleanup function.

We do something similar. A feed, for example, stores a list of keys
pointing to feed items. In memory, I store "pending insertion" and
"pending deletion" lists on a feed. At save time, a model callback saves
the items pending insertion, and deletes any pending deletion.

The conflict resolution operator for a feed takes the union of all feed
item keys, sorts them (to keep the most recent X) and truncates the
list--storing all the truncated items in the pending-deletion list. This
isn't perfect, but has acceptable probabilistic bounds for our use case.

--Kyle

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com