Forcing Siblings to Occur

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Forcing Siblings to Occur

Mark A. Basil, Jr.

Is there some method that is either guaranteed or very highly likely to create Siblings of an object (that isn’t a counter)?  I would like to have a reliable method to test code which is meant to handle them.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

John Daily
Updating any key without supplying a vector clock is guaranteed to create a sibling.

-John

On Nov 8, 2013, at 1:29 PM, Mark A. Basil, Jr. <[hidden email]> wrote:

Is there some method that is either guaranteed or very highly likely to create Siblings of an object (that isn’t a counter)?  I would like to have a reliable method to test code which is meant to handle them.
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Russell Brown-2
In reply to this post by Mark A. Basil, Jr.
Hi Mark,
It is pretty easy.

Set your bucket to allow_mult=true.

Send a put to bucket, key.
Send another one to the same bucket key.

If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

With the pb client it is as simple as: https://gist.github.com/russelldb/46153e2ab9d2b9206f63

Hope that Helps

Russell

On 8 Nov 2013, at 18:29, Mark A. Basil, Jr. <[hidden email]> wrote:

> Is there some method that is either guaranteed or very highly likely to create Siblings of an object (that isn’t a counter)?  I would like to have a reliable method to test code which is meant to handle them.
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Brian Roach-2
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Olav Frengstad
Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).

It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?

A section of the docs[1] comees comes to mind:

"Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."


2013/11/9 Brian Roach <[hidden email]>
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

John Daily
Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.

Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.

These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.

-John

On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:

Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).

It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?

A section of the docs[1] comees comes to mind:

"Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."


2013/11/9 Brian Roach <[hidden email]>
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Jason Campbell-2
I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).

I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.

So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.

Thoughts are welcome,
Jason
From: John Daily
Sent: Wednesday, 13 November 2013 3:10 AM
To: Olav Frengstad
Cc: riak-users
Subject: Re: Forcing Siblings to Occur

Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.

Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.

These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.

-John

On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:

Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).

It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?

A section of the docs[1] comees comes to mind:

"Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."


2013/11/9 Brian Roach <[hidden email]>
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

John Daily
Jason, I don’t see any inherent problems, given reasonable management of the situation as you describe. I’d have to chase the code path to see what overhead you’re introducing to Riak’s processing, but if it’s working well for you, then who am I to object?

Perhaps someone who’s more familiar with the sibling management code could chime in.

-John

On Nov 12, 2013, at 5:10 PM, Jason Campbell <[hidden email]> wrote:

I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).

I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.

So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.

Thoughts are welcome,
Jason
From: John Daily
Sent: Wednesday, 13 November 2013 3:10 AM
To: Olav Frengstad
Cc: riak-users
Subject: Re: Forcing Siblings to Occur

Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.

Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.

These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.

-John

On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:

Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).

It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?

A section of the docs[1] comees comes to mind:

"Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."


2013/11/9 Brian Roach <[hidden email]>
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Olav Frengstad
@John, I'm definitely looking forward to CRDT's but at the same time i'm looking into alternative approaches for achieving the same thing. 

@Jason, your description is close to what i had in mind. Only real difference is merge would be on read. I did some testing and m/r seems to work by using an initial map phase calling `riak_object:get_values`


There's also the addition of maximum number of siblings in riak-2.0[1]




2013/11/13 John Daily <[hidden email]>
Jason, I don’t see any inherent problems, given reasonable management of the situation as you describe. I’d have to chase the code path to see what overhead you’re introducing to Riak’s processing, but if it’s working well for you, then who am I to object?

Perhaps someone who’s more familiar with the sibling management code could chime in.

-John


On Nov 12, 2013, at 5:10 PM, Jason Campbell <[hidden email]> wrote:

I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).

I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.

So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.

Thoughts are welcome,
Jason
From: John Daily
Sent: Wednesday, 13 November 2013 3:10 AM
To: Olav Frengstad
Cc: riak-users
Subject: Re: Forcing Siblings to Occur

Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.

Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.

These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.

-John

On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:

Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).

It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?

A section of the docs[1] comees comes to mind:

"Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."


2013/11/9 Brian Roach <[hidden email]>
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--
Med Vennlig Hilsen
Olav Frengstad

Systemutvikler // FWT
+47 920 42 090

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Olav Frengstad


2013/11/13 Olav Frengstad <[hidden email]>
@John, I'm definitely looking forward to CRDT's but at the same time i'm looking into alternative approaches for achieving the same thing. 

@Jason, your description is close to what i had in mind. Only real difference is merge would be on read. I did some testing and m/r seems to work by using an initial map phase calling `riak_object:get_values`


There's also the addition of maximum number of siblings in riak-2.0[1]




2013/11/13 John Daily <[hidden email]>
Jason, I don’t see any inherent problems, given reasonable management of the situation as you describe. I’d have to chase the code path to see what overhead you’re introducing to Riak’s processing, but if it’s working well for you, then who am I to object?

Perhaps someone who’s more familiar with the sibling management code could chime in.

-John


On Nov 12, 2013, at 5:10 PM, Jason Campbell <[hidden email]> wrote:

I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).

I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.

So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.

Thoughts are welcome,
Jason
From: John Daily
Sent: Wednesday, 13 November 2013 3:10 AM
To: Olav Frengstad
Cc: riak-users
Subject: Re: Forcing Siblings to Occur

Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.

Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.

These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.

-John

On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:

Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).

It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?

A section of the docs[1] comees comes to mind:

"Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."


2013/11/9 Brian Roach <[hidden email]>
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




--
Med Vennlig Hilsen
Olav Frengstad

Systemutvikler // FWT
+47 920 42 090



--
Med Vennlig Hilsen
Olav Frengstad

Systemutvikler // FWT
+47 920 42 090

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Sam Elliott
We have introduced these so that users who are accidentally creating siblings (and large objects) can be notified in their logs.  

1) Of course you can choose to change the limits
2) Please Please only do so if you know what you're doing. There's a certain amount of "you're on your own", because we'll set the defaults for these values to what we think is sensible for Riak. Of course, if you think your magical system will work fine, it's entirely up to you.

We hope that our Data Types will provide enough useful types such that they cover the vast majority of data models. Riak 2.0 will get counters, sets, and maps, but if you have suggestions for generalised data structures you think would be useful to other Riak users, do send them our way and we'll see what we can do.

Sam

--  
Sam Elliott
Engineer
[hidden email]
--


On Wednesday, 13 November 2013 at 12:43AM, Olav Frengstad wrote:

> Forgot the link!
>  
> [1] https://github.com/basho/riak_kv/commit/6981450c5ffc18207b3a1dc057fd3840a0906c42
>  
>  
> 2013/11/13 Olav Frengstad <[hidden email] (mailto:[hidden email])>
> > @John, I'm definitely looking forward to CRDT's but at the same time i'm looking into alternative approaches for achieving the same thing.  
> >  
> > @Jason, your description is close to what i had in mind. Only real difference is merge would be on read. I did some testing and m/r seems to work by using an initial map phase calling `riak_object:get_values`
> >  
> >  
> > There's also the addition of maximum number of siblings in riak-2.0[1]
> >  
> >  
> >  
> >  
> > 2013/11/13 John Daily <[hidden email] (mailto:[hidden email])>
> > > Jason, I don’t see any inherent problems, given reasonable management of the situation as you describe. I’d have to chase the code path to see what overhead you’re introducing to Riak’s processing, but if it’s working well for you, then who am I to object?
> > >  
> > > Perhaps someone who’s more familiar with the sibling management code could chime in.
> > >  
> > > -John
> > >  
> > >  
> > > On Nov 12, 2013, at 5:10 PM, Jason Campbell <[hidden email] (mailto:[hidden email])> wrote:
> > > > I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).  
> > > >  
> > > > I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.  
> > > >  
> > > > So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.  
> > > >  
> > > > Thoughts are welcome,
> > > > Jason
> > > >  
> > > > From: John Daily
> > > > Sent: Wednesday, 13 November 2013 3:10 AM
> > > > To: Olav Frengstad
> > > > Cc: riak-users
> > > > Subject: Re: Forcing Siblings to Occur
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > > Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.  
> > > >  
> > > > Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.  
> > > >  
> > > > These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
> > > >  
> > > > -John
> > > > On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email] (mailto:[hidden email])> wrote:
> > > > > Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
> > > > > For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).
> > > > >  
> > > > > It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?
> > > > >  
> > > > > A section of the docs[1] comees comes to mind:
> > > > >  
> > > > > "Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."
> > > > >  
> > > > > [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings 
> > > > >  
> > > > > 2013/11/9 Brian Roach <[hidden email] (mailto:[hidden email])>
> > > > > > On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email] (mailto:[hidden email])> wrote:
> > > > > >  
> > > > > > > If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.
> > > > > >  
> > > > > > for (int i = 0; i < numReplicasWanted; i++) {
> > > > > > bucket.store("key", "value").withoutFetch().execute();
> > > > > > }
> > > > > >  
> > > > > > :)
> > > > > >  
> > > > > > - Roach
> > > > > >  
> > > > > > _______________________________________________
> > > > > > riak-users mailing list
> > > > > > [hidden email] (mailto:[hidden email])
> > > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________
> > > > >  
> > > > > riak-users mailing list
> > > > > [hidden email] (mailto:[hidden email])
> > > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > >  
> > > >  
> > > >  
> > > > _______________________________________________
> > > > riak-users mailing list
> > > > [hidden email] (mailto:[hidden email])
> > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >  
> >  
> >  
> >  
> >  
> > --  
> > Med Vennlig Hilsen
> > Olav Frengstad
> >  
> > Systemutvikler // FWT
> > +47 920 42 090
>  
>  
>  
>  
>  
>  
> --  
> Med Vennlig Hilsen
> Olav Frengstad
>  
> Systemutvikler // FWT
> +47 920 42 090
>  
>  
> _______________________________________________
> riak-users mailing list
> [hidden email] (mailto:[hidden email])
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Carlos Baquero
In reply to this post by Jason Campbell-2

Its interesting to see a use case where a grow only set is sufficient. I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the expense of some extra complexity in element storage and logarithmic metadata growth per operation. But for your case a simple direct set of elements with server side merge by set union looks perfect. Its not efficient at all to keep all those siblings if a simple server side merge can reduce them.

Maybe it is a good idea to not overlook the potential usefulness of simple grow only sets and add that datatype to the 2.0 server side CRDTs library. And maybe even 2P-Sets that only allow deleting once, might be useful for some cases.

Regards,
Carlos

-----
Carlos Baquero
HASLab / INESC TEC &
Universidade do Minho,
Portugal

[hidden email]
http://gsd.di.uminho.pt/cbm





On 12/11/2013, at 22:10, Jason Campbell wrote:

> I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).
>
> I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.
>
> So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.
>
> Thoughts are welcome,
> Jason
> From: John Daily
> Sent: Wednesday, 13 November 2013 3:10 AM
> To: Olav Frengstad
> Cc: riak-users
> Subject: Re: Forcing Siblings to Occur
>
> Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.
>
> Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.
>
> These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
>
> -John
>
> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:
>
>> Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
>> For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).
>>
>> It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?
>>
>> A section of the docs[1] comees comes to mind:
>>
>> "Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."
>>
>> [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
>>
>> 2013/11/9 Brian Roach <[hidden email]>
>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:
>>
>> > If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.
>>
>> for (int i = 0; i < numReplicasWanted; i++) {
>>     bucket.store("key", "value").withoutFetch().execute();
>> }
>>
>> :)
>>
>> - Roach
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Russell Brown-2

On 13 Nov 2013, at 10:03, Carlos Baquero <[hidden email]> wrote:

>
> Its interesting to see a use case where a grow only set is sufficient. I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the expense of some extra complexity in element storage and logarithmic metadata growth per operation. But for your case a simple direct set of elements with server side merge by set union looks perfect. Its not efficient at all to keep all those siblings if a simple server side merge can reduce them.
>
> Maybe it is a good idea to not overlook the potential usefulness of simple grow only sets and add that datatype to the 2.0 server side CRDTs library. And maybe even 2P-Sets that only allow deleting once, might be useful for some cases.

We plan to add more data types in future, I don’t think they’ll make them into 2.0. You can use an ORSet as a G-Set, though, just only ever add to it. The overhead is pretty small.

the difficulty is exposing different “flavours” of CRDTs in a non-confusing way. We chose to go with the name “data type” and name the implementations generically (set, map, counter.) I wonder if we painted ourselves into a corner.

Cheers

Russell

>
> Regards,
> Carlos
>
> -----
> Carlos Baquero
> HASLab / INESC TEC &
> Universidade do Minho,
> Portugal
>
> [hidden email]
> http://gsd.di.uminho.pt/cbm
>
>
>
>
>
> On 12/11/2013, at 22:10, Jason Campbell wrote:
>
>> I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).
>>
>> I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.
>>
>> So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.
>>
>> Thoughts are welcome,
>> Jason
>> From: John Daily
>> Sent: Wednesday, 13 November 2013 3:10 AM
>> To: Olav Frengstad
>> Cc: riak-users
>> Subject: Re: Forcing Siblings to Occur
>>
>> Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.
>>
>> Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.
>>
>> These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
>>
>> -John
>>
>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:
>>
>>> Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
>>> For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).
>>>
>>> It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?
>>>
>>> A section of the docs[1] comees comes to mind:
>>>
>>> "Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."
>>>
>>> [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
>>>
>>> 2013/11/9 Brian Roach <[hidden email]>
>>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:
>>>
>>>> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.
>>>
>>> for (int i = 0; i < numReplicasWanted; i++) {
>>>    bucket.store("key", "value").withoutFetch().execute();
>>> }
>>>
>>> :)
>>>
>>> - Roach
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Hector Castro-2
The `put_index` snippet in the following blog post actually forces the
creation of siblings (while `get_index` resolves them by doing a set
union):

http://basho.com/index-for-fun-and-for-profit/

As John said, you definitely want to be careful not to create too many
siblings because that'll impact the overall Riak object size.

--
Hector


On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown <[hidden email]> wrote:

>
> On 13 Nov 2013, at 10:03, Carlos Baquero <[hidden email]> wrote:
>
>>
>> Its interesting to see a use case where a grow only set is sufficient. I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the expense of some extra complexity in element storage and logarithmic metadata growth per operation. But for your case a simple direct set of elements with server side merge by set union looks perfect. Its not efficient at all to keep all those siblings if a simple server side merge can reduce them.
>>
>> Maybe it is a good idea to not overlook the potential usefulness of simple grow only sets and add that datatype to the 2.0 server side CRDTs library. And maybe even 2P-Sets that only allow deleting once, might be useful for some cases.
>
> We plan to add more data types in future, I don’t think they’ll make them into 2.0. You can use an ORSet as a G-Set, though, just only ever add to it. The overhead is pretty small.
>
> the difficulty is exposing different “flavours” of CRDTs in a non-confusing way. We chose to go with the name “data type” and name the implementations generically (set, map, counter.) I wonder if we painted ourselves into a corner.
>
> Cheers
>
> Russell
>
>>
>> Regards,
>> Carlos
>>
>> -----
>> Carlos Baquero
>> HASLab / INESC TEC &
>> Universidade do Minho,
>> Portugal
>>
>> [hidden email]
>> http://gsd.di.uminho.pt/cbm
>>
>>
>>
>>
>>
>> On 12/11/2013, at 22:10, Jason Campbell wrote:
>>
>>> I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).
>>>
>>> I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.
>>>
>>> So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.
>>>
>>> Thoughts are welcome,
>>> Jason
>>> From: John Daily
>>> Sent: Wednesday, 13 November 2013 3:10 AM
>>> To: Olav Frengstad
>>> Cc: riak-users
>>> Subject: Re: Forcing Siblings to Occur
>>>
>>> Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.
>>>
>>> Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.
>>>
>>> These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
>>>
>>> -John
>>>
>>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:
>>>
>>>> Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
>>>> For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).
>>>>
>>>> It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?
>>>>
>>>> A section of the docs[1] comees comes to mind:
>>>>
>>>> "Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."
>>>>
>>>> [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
>>>>
>>>> 2013/11/9 Brian Roach <[hidden email]>
>>>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:
>>>>
>>>>> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.
>>>>
>>>> for (int i = 0; i < numReplicasWanted; i++) {
>>>>    bucket.store("key", "value").withoutFetch().execute();
>>>> }
>>>>
>>>> :)
>>>>
>>>> - Roach
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Olav Frengstad
Thanks for the input.

If i understand correctly the only size overhead would be in the extra metadata added by all the siblings?


2013/11/13 Hector Castro <[hidden email]>
The `put_index` snippet in the following blog post actually forces the
creation of siblings (while `get_index` resolves them by doing a set
union):

http://basho.com/index-for-fun-and-for-profit/

As John said, you definitely want to be careful not to create too many
siblings because that'll impact the overall Riak object size.

--
Hector


On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown <[hidden email]> wrote:
>
> On 13 Nov 2013, at 10:03, Carlos Baquero <[hidden email]> wrote:
>
>>
>> Its interesting to see a use case where a grow only set is sufficient. I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the expense of some extra complexity in element storage and logarithmic metadata growth per operation. But for your case a simple direct set of elements with server side merge by set union looks perfect. Its not efficient at all to keep all those siblings if a simple server side merge can reduce them.
>>
>> Maybe it is a good idea to not overlook the potential usefulness of simple grow only sets and add that datatype to the 2.0 server side CRDTs library. And maybe even 2P-Sets that only allow deleting once, might be useful for some cases.
>
> We plan to add more data types in future, I don’t think they’ll make them into 2.0. You can use an ORSet as a G-Set, though, just only ever add to it. The overhead is pretty small.
>
> the difficulty is exposing different “flavours” of CRDTs in a non-confusing way. We chose to go with the name “data type” and name the implementations generically (set, map, counter.) I wonder if we painted ourselves into a corner.
>
> Cheers
>
> Russell
>
>>
>> Regards,
>> Carlos
>>
>> -----
>> Carlos Baquero
>> HASLab / INESC TEC &
>> Universidade do Minho,
>> Portugal
>>
>> [hidden email]
>> http://gsd.di.uminho.pt/cbm
>>
>>
>>
>>
>>
>> On 12/11/2013, at 22:10, Jason Campbell wrote:
>>
>>> I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).
>>>
>>> I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.
>>>
>>> So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.
>>>
>>> Thoughts are welcome,
>>> Jason
>>> From: John Daily
>>> Sent: Wednesday, 13 November 2013 3:10 AM
>>> To: Olav Frengstad
>>> Cc: riak-users
>>> Subject: Re: Forcing Siblings to Occur
>>>
>>> Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.
>>>
>>> Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.
>>>
>>> These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
>>>
>>> -John
>>>
>>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:
>>>
>>>> Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
>>>> For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).
>>>>
>>>> It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?
>>>>
>>>> A section of the docs[1] comees comes to mind:
>>>>
>>>> "Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."
>>>>
>>>> [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
>>>>
>>>> 2013/11/9 Brian Roach <[hidden email]>
>>>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:
>>>>
>>>>> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.
>>>>
>>>> for (int i = 0; i < numReplicasWanted; i++) {
>>>>    bucket.store("key", "value").withoutFetch().execute();
>>>> }
>>>>
>>>> :)
>>>>
>>>> - Roach
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Med Vennlig Hilsen
Olav Frengstad

Systemutvikler // FWT
+47 920 42 090

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Forcing Siblings to Occur

Jeremiah Peschka
s/metadata/data/ - each sibling is a discrete copy of whatever data you've put in it + metadata. 

In the case of the client side indexes, you're right - the bulk of the increased storage will be from metadata.

---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Wed, Nov 13, 2013 at 8:12 AM, Olav Frengstad <[hidden email]> wrote:
Thanks for the input.

If i understand correctly the only size overhead would be in the extra metadata added by all the siblings?


2013/11/13 Hector Castro <[hidden email]>
The `put_index` snippet in the following blog post actually forces the
creation of siblings (while `get_index` resolves them by doing a set
union):

http://basho.com/index-for-fun-and-for-profit/

As John said, you definitely want to be careful not to create too many
siblings because that'll impact the overall Riak object size.

--
Hector


On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown <[hidden email]> wrote:
>
> On 13 Nov 2013, at 10:03, Carlos Baquero <[hidden email]> wrote:
>
>>
>> Its interesting to see a use case where a grow only set is sufficient. I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the expense of some extra complexity in element storage and logarithmic metadata growth per operation. But for your case a simple direct set of elements with server side merge by set union looks perfect. Its not efficient at all to keep all those siblings if a simple server side merge can reduce them.
>>
>> Maybe it is a good idea to not overlook the potential usefulness of simple grow only sets and add that datatype to the 2.0 server side CRDTs library. And maybe even 2P-Sets that only allow deleting once, might be useful for some cases.
>
> We plan to add more data types in future, I don’t think they’ll make them into 2.0. You can use an ORSet as a G-Set, though, just only ever add to it. The overhead is pretty small.
>
> the difficulty is exposing different “flavours” of CRDTs in a non-confusing way. We chose to go with the name “data type” and name the implementations generically (set, map, counter.) I wonder if we painted ourselves into a corner.
>
> Cheers
>
> Russell
>
>>
>> Regards,
>> Carlos
>>
>> -----
>> Carlos Baquero
>> HASLab / INESC TEC &
>> Universidade do Minho,
>> Portugal
>>
>> [hidden email]
>> http://gsd.di.uminho.pt/cbm
>>
>>
>>
>>
>>
>> On 12/11/2013, at 22:10, Jason Campbell wrote:
>>
>>> I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).
>>>
>>> I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.
>>>
>>> So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.
>>>
>>> Thoughts are welcome,
>>> Jason
>>> From: John Daily
>>> Sent: Wednesday, 13 November 2013 3:10 AM
>>> To: Olav Frengstad
>>> Cc: riak-users
>>> Subject: Re: Forcing Siblings to Occur
>>>
>>> Forcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.
>>>
>>> Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.
>>>
>>> These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
>>>
>>> -John
>>>
>>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <[hidden email]> wrote:
>>>
>>>> Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.
>>>> For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).
>>>>
>>>> It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?
>>>>
>>>> A section of the docs[1] comees comes to mind:
>>>>
>>>> "Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."
>>>>
>>>> [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
>>>>
>>>> 2013/11/9 Brian Roach <[hidden email]>
>>>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <[hidden email]> wrote:
>>>>
>>>>> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.
>>>>
>>>> for (int i = 0; i < numReplicasWanted; i++) {
>>>>    bucket.store("key", "value").withoutFetch().execute();
>>>> }
>>>>
>>>> :)
>>>>
>>>> - Roach
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Med Vennlig Hilsen
Olav Frengstad

Systemutvikler // FWT
<a href="tel:%2B47%20920%2042%20090" value="+4792042090" target="_blank">+47 920 42 090

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com