Clarification questions re bucket types

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Clarification questions re bucket types

Henning Verbeek
Hi guys,

I still struggle with bucket types and have some questions. Going back
a year I could not find many threads about it, but forgive me if I
missed something and am asking already-answered questions.

## Cluster-awareness

I've understood so far that bucket types are used as part of the
namespace, and that they can also hold additional configuration
properties (such as datatypes). I've also read that they are
"lightweight", and are not being "gossiped around the cluster"
compared to properties set on buckets directly. Because of this they
are recommended for bucket configuration; in fact configuration of
newer properties (CRDT) are only available via bucket types.

The statement about "not gossiped around the cluster" makes me wonder:
does that mean that the bucket types must be defined (and activated?)
on each cluster node? The documentation at
[http://docs.basho.com/riak/kv/2.2.0/using/cluster-operations/bucket-types/]
does not explain this.

I find the example at this link actually really confusing: A bucket
type is created _without properties_, and then a property is set on
the individual bucket. I thought, you're exactly not supposed to do
that but rather use plenty of bucket types...?! (Yes, the example
further down explains that the bucket type could be defined with the
property; but why is it shown here as if this is the exception?)

## API-availability of bucket type creation

I've seen [this
thread](http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-July/018574.html)
about programatically creating bucket-types. Is there any progress on
this? I really really struggle with the concept that bucket types are
supposed to be created _manually_, _on the node(s)_ themselves. How
would the application know if this has happened or not? And what
should it do if not - fail?

In my eyes, the application is responsible for ensuring "data
definition", not an admin. At startup the application should check the
environment against its expectations and make the necessary changes.
An example of this is schema versioning in relational databases. If
this would require administrator interaction, the synchronisation
required is immense. How do others deal with this situation?

I couldn't find anything regarding bucket-types in
[javadoc](http://basho.github.io/riak-java-client/2.1.0/), so I assume
this is not available yet.


Thank you for your help and information!
Cheers,
Henning

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Clarification questions re bucket types

Shaun McVey
Hi Henning,
So normally, a bucket's custom properties are stored in the ring file.  It's this file which is gossiped around regularly in a cluster.  When users create hundreds/thousands of custom customer properties (it's been done), it can grind the cluster almost to a standstill.

A bucket-type only has to be created and activated on one node.  The operation is still sent to the other nodes automatically on creation so there's no need to repeat the command on multiple nodes.

I can see why you think that the example is confusing.  Normally, a bucket's custom properties is stored in the ring file, but only for the 'default' bucket-type.  When you set custom properties on a bucket inside a non-default bucket-type, then that information is ALSO stored in the cluster metadata file, not the ring file, and is never gossiped around the ring.  The example shows setting properties on a bucket, but because the bucket-type isn't 'default', there's no penalty involved.

So in summary, operations like this are fine (because the bucket-type is 'animals', not 'default'):

curl -XPUT $RIAK_HOST/types/animals/buckets/cats/props \ -H 'Content-Type: application/json' \ -d '{"props":{"search_index":"famous"}}'

Operations like this are should be avoided (first one is using the default bucket-type, second also uses the default bucket-type in a backwards compatible format):

curl -XPUT $RIAK_HOST/types/default/buckets/cats/props \ -H 'Content-Type: application/json' \ -d '{"props":{"search_index":"famous"}}'

curl -XPUT $RIAK_HOST/buckets/cats/props \ -H 'Content-Type: application/json' \ -d '{"props":{"search_index":"famous"}}'


On to your second question, if an application were to attempt a write to a non-existing bucket-type, it would receive an error that the bucket type is unknown, and no data would be stored in the cluster.

I agree with you about that data definition should really be in the domain of the application and not an administrator.  Perhaps others can chip in with their experiences and how they get around this.  However, you can consider this feature is still on the cards for a future release, but I couldn't make any immediate promises about when.

Kind Regards,
Shaun

On Mon, Dec 5, 2016 at 2:40 PM, Henning Verbeek <[hidden email]> wrote:
Hi guys,

I still struggle with bucket types and have some questions. Going back
a year I could not find many threads about it, but forgive me if I
missed something and am asking already-answered questions.

## Cluster-awareness

I've understood so far that bucket types are used as part of the
namespace, and that they can also hold additional configuration
properties (such as datatypes). I've also read that they are
"lightweight", and are not being "gossiped around the cluster"
compared to properties set on buckets directly. Because of this they
are recommended for bucket configuration; in fact configuration of
newer properties (CRDT) are only available via bucket types.

The statement about "not gossiped around the cluster" makes me wonder:
does that mean that the bucket types must be defined (and activated?)
on each cluster node? The documentation at
[http://docs.basho.com/riak/kv/2.2.0/using/cluster-operations/bucket-types/]
does not explain this.

I find the example at this link actually really confusing: A bucket
type is created _without properties_, and then a property is set on
the individual bucket. I thought, you're exactly not supposed to do
that but rather use plenty of bucket types...?! (Yes, the example
further down explains that the bucket type could be defined with the
property; but why is it shown here as if this is the exception?)

## API-availability of bucket type creation

I've seen [this
thread](http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-July/018574.html)
about programatically creating bucket-types. Is there any progress on
this? I really really struggle with the concept that bucket types are
supposed to be created _manually_, _on the node(s)_ themselves. How
would the application know if this has happened or not? And what
should it do if not - fail?

In my eyes, the application is responsible for ensuring "data
definition", not an admin. At startup the application should check the
environment against its expectations and make the necessary changes.
An example of this is schema versioning in relational databases. If
this would require administrator interaction, the synchronisation
required is immense. How do others deal with this situation?

I couldn't find anything regarding bucket-types in
[javadoc](http://basho.github.io/riak-java-client/2.1.0/), so I assume
this is not available yet.


Thank you for your help and information!
Cheers,
Henning

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...