Reconciling Concurrent Writes

classic Classic list List threaded Threaded
4 messages Options
J T
Reply | Threaded
Open this post in threaded view
|

Reconciling Concurrent Writes

J T
Hi.

I'm trying to get my head around concurrent updates in riak.

Lets assume we have a single document in a bucket, that contains a list of terms.

Lets also assume, that we have two clients A and B, both of which GET the current version of the document at around the same time.

Client A then removes a term from the list in the document and then writes the document back.
Client B also removes a term from the list in the document and writes it back.

A subsequent Client C, reads the document and is given back two versions of the document and has to somehow reconcile the differences.

My question is how you might go about reconciling.

It could be that both versions have valid changes and the version C needs to work with would in fact have both terms removed from the list.

Given that the document object associated with a key is opaque to riak (i think), I can understand why riak can't try and resolve the situation.

I've had a read of the dynamo paper but it didn't make things any clearer for me.

I also spotted the 'read your writes' comment in the riak paper but it didn't make any sense to me.

I'd appreciate some pointers on best practices for dealing with this.

Jason

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Reconciling Concurrent Writes

Justin Sheehy
Hi, J T.

On Tue, Feb 23, 2010 at 7:04 PM, J T <[hidden email]> wrote:

> My question is how you might go about reconciling.
> It could be that both versions have valid changes and the version C needs to
> work with would in fact have both terms removed from the list.

There are a few ways to deal with this kind of thing, and the best
choice will depend on the application in question.  That is, the
semantics of that list will determine whether the right choice is
really to delete both items, keep both items, delete one item, or ask
a user to choose.

There are a number of "quick and dirty" things that work fine for many
applications, especially in light of the assumption that a
well-written application will produce very few conflicts even under
heavy concurrent client load.  For instance, some applications might
find accidental deletions to be very bad, but occasionally keeping
something that was supposed to be deleted to be acceptable.  Such an
application would just perform set union on the lists for resolution.
Other applications just pick the last of those conflicting updates to
have been received, and discard the others.  These approaches can
throw away data, but are simple and are fairly close to what most
programmers do when using an RDBMS in a typical (i.e. not everything
is in a transaction) kind of way.

To have the best possible story for resolution in such cases as you
describe, you would want to not just store the data structure in
question, but the operations on that data.  You would also want to
make sure that the way you implement the operations (e.g. "delete
item", "add item") is both commutative (can be safely re-ordered) and
idempotent (can be safely replayed).  If you do this, then you can
merge conflicting data simply by replaying all of the operations in
both siblings.  The only remaining difficulty is deciding what to do
with conflicting operations -- if A had added an item and B had
deleted the same item, for instance.  This detail must be decided by
you, the application author, as there is no generic right answer.  But
for something like what you described, this wouldn't come into play as
both "delete" operations would be processed, and C would end up with a
correct single list of terms.

> I also spotted the 'read your writes' comment in the riak paper but it
> didn't make any sense to me.

Read-your-writes consistency is just about ensuring that a single
client has a locally consistent timeline -- that anything it reads
reflects all of its past writes.  That notion is important, but
doesn't help with multi-client conflict situations.

> I'd appreciate some pointers on best practices for dealing with this.

I hope this gets you started down a useful direction.

Best,

-Justin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
J T
Reply | Threaded
Open this post in threaded view
|

Re: Reconciling Concurrent Writes

J T
Hi Justin,

Thanks for the info.

One approach I was considering, since I may need to cache things
anyway, is to implement a distributed caching layer, whereby each
object retrieved from a bucket is cached but as a process with a TTL.
Any operations occur via an API on the cached process that corresponds
with that bucket/key.

I think, this way I could if i wanted to, introduce transaction
semantics at least on a per document basis.

I'm not sure yet though what the impact of that might be.

Jason

On 24 February 2010 00:21, Justin Sheehy <[hidden email]> wrote:

> Hi, J T.
>
> On Tue, Feb 23, 2010 at 7:04 PM, J T <[hidden email]> wrote:
>
>> My question is how you might go about reconciling.
>> It could be that both versions have valid changes and the version C needs to
>> work with would in fact have both terms removed from the list.
>
> There are a few ways to deal with this kind of thing, and the best
> choice will depend on the application in question.  That is, the
> semantics of that list will determine whether the right choice is
> really to delete both items, keep both items, delete one item, or ask
> a user to choose.
>
> There are a number of "quick and dirty" things that work fine for many
> applications, especially in light of the assumption that a
> well-written application will produce very few conflicts even under
> heavy concurrent client load.  For instance, some applications might
> find accidental deletions to be very bad, but occasionally keeping
> something that was supposed to be deleted to be acceptable.  Such an
> application would just perform set union on the lists for resolution.
> Other applications just pick the last of those conflicting updates to
> have been received, and discard the others.  These approaches can
> throw away data, but are simple and are fairly close to what most
> programmers do when using an RDBMS in a typical (i.e. not everything
> is in a transaction) kind of way.
>
> To have the best possible story for resolution in such cases as you
> describe, you would want to not just store the data structure in
> question, but the operations on that data.  You would also want to
> make sure that the way you implement the operations (e.g. "delete
> item", "add item") is both commutative (can be safely re-ordered) and
> idempotent (can be safely replayed).  If you do this, then you can
> merge conflicting data simply by replaying all of the operations in
> both siblings.  The only remaining difficulty is deciding what to do
> with conflicting operations -- if A had added an item and B had
> deleted the same item, for instance.  This detail must be decided by
> you, the application author, as there is no generic right answer.  But
> for something like what you described, this wouldn't come into play as
> both "delete" operations would be processed, and C would end up with a
> correct single list of terms.
>
>> I also spotted the 'read your writes' comment in the riak paper but it
>> didn't make any sense to me.
>
> Read-your-writes consistency is just about ensuring that a single
> client has a locally consistent timeline -- that anything it reads
> reflects all of its past writes.  That notion is important, but
> doesn't help with multi-client conflict situations.
>
>> I'd appreciate some pointers on best practices for dealing with this.
>
> I hope this gets you started down a useful direction.
>
> Best,
>
> -Justin
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
J T
Reply | Threaded
Open this post in threaded view
|

Re: Reconciling Concurrent Writes

J T
In reply to this post by Justin Sheehy
Justin,

One thing that occurred to me over night though is to do with the
links property in the meta data for a document.

Lets say I had a Band bucket and a Fan bucket.

I could, in the Band documents meta-data, store { links, [ {{
FanBucket, FanKey#1 }, a_fan }, ..... ] }

My understanding is that the links structure is defined.

In which case I have less scope for adding housekeeping details to
indicate how to reconcile - although I suppose I could add meta-data
of my own.

I was also wondering what limitations there are for document meta data
with respect to size ?

e.g. If a Band had a million fans, how well would riak handle having a
million item long list of links ?

Instead, in the case of potentially large relationship sets, would it
be better to have the links held in a separate bucket, as a document
per band/fan relationship ?


Jason

On 24 February 2010 00:21, Justin Sheehy <[hidden email]> wrote:

> Hi, J T.
>
> On Tue, Feb 23, 2010 at 7:04 PM, J T <[hidden email]> wrote:
>
>> My question is how you might go about reconciling.
>> It could be that both versions have valid changes and the version C needs to
>> work with would in fact have both terms removed from the list.
>
> There are a few ways to deal with this kind of thing, and the best
> choice will depend on the application in question.  That is, the
> semantics of that list will determine whether the right choice is
> really to delete both items, keep both items, delete one item, or ask
> a user to choose.
>
> There are a number of "quick and dirty" things that work fine for many
> applications, especially in light of the assumption that a
> well-written application will produce very few conflicts even under
> heavy concurrent client load.  For instance, some applications might
> find accidental deletions to be very bad, but occasionally keeping
> something that was supposed to be deleted to be acceptable.  Such an
> application would just perform set union on the lists for resolution.
> Other applications just pick the last of those conflicting updates to
> have been received, and discard the others.  These approaches can
> throw away data, but are simple and are fairly close to what most
> programmers do when using an RDBMS in a typical (i.e. not everything
> is in a transaction) kind of way.
>
> To have the best possible story for resolution in such cases as you
> describe, you would want to not just store the data structure in
> question, but the operations on that data.  You would also want to
> make sure that the way you implement the operations (e.g. "delete
> item", "add item") is both commutative (can be safely re-ordered) and
> idempotent (can be safely replayed).  If you do this, then you can
> merge conflicting data simply by replaying all of the operations in
> both siblings.  The only remaining difficulty is deciding what to do
> with conflicting operations -- if A had added an item and B had
> deleted the same item, for instance.  This detail must be decided by
> you, the application author, as there is no generic right answer.  But
> for something like what you described, this wouldn't come into play as
> both "delete" operations would be processed, and C would end up with a
> correct single list of terms.
>
>> I also spotted the 'read your writes' comment in the riak paper but it
>> didn't make any sense to me.
>
> Read-your-writes consistency is just about ensuring that a single
> client has a locally consistent timeline -- that anything it reads
> reflects all of its past writes.  That notion is important, but
> doesn't help with multi-client conflict situations.
>
>> I'd appreciate some pointers on best practices for dealing with this.
>
> I hope this gets you started down a useful direction.
>
> Best,
>
> -Justin
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com