Production deployment requirements for memory backend storage

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Production deployment requirements for memory backend storage

Neeraj Poddar

Hello,

 

I wanted to understand the production requirements for using Riak as a non-persistent ephemeral data store. In particular the following questions relate to using Riak with “memory” configured as storage backend:

 

1.       What is the “platform_data_dir” used for when memory is used as storage backend? Is it only needed for active anti-entropy and cluster metadata? Do I need to persist this data i.e. if a node goes down and restarts in this configuration, is persistence of data in “platform_data_dir” required.

2.       What is the minimum memory requirement of an empty Riak node in this configuration?

3.       What is the minimum disk and CPU requirement of a Riak node in this configuration?

 

-- 

Regards,

Neeraj Poddar

 


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Production deployment requirements for memory backend storage

Neeraj Poddar
Hi, It would be very helpful if someone can provide inputs for questions posted above. Queries related to Riak KV 2.2.0 and above in particular.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Production deployment requirements for memory backend storage

Charlie Voiselle
In reply to this post by Neeraj Poddar

Neeraj:

Thanks for you interest in Riak. I will copy your questions into this email for reference and answer them inline.

1. What is the “platform_data_dir” used for when memory is used as storage backend? Is it only needed for active anti-entropy and cluster metadata? Do I need to persist this data i.e. if a node goes down and restarts in this configuration, is persistence of data in “platform_data_dir” required.

As you have pointed out, the platform_data_dir contains more than just the actual data stored in the cluster. There are three folders that must be persisted for a node to remain a member of a cluster and to not create issues with the sizes of the vector clocks internal to the objects. They are:

  • ring - The binary files that describe the cluster and the vnode ownership mappings. Deleting this folder will cause the node to start up and create a new default ring. This default ring will allocate 100% of the partitions to that node. This is non-fatal and is resolved by rejoining the node to the cluster. This extra work can be avoided by persisting the ring file properly.

  • cluster_meta - This folder contains the properties for bucket types and typed custom buckets.

  • kv_vnode - This folder contains generated actor-ids for each Riak vnode. The routine loss of this directory will cause orphaned vnode actor-ids to potentially accumulate in objects’ vclocks.

Active anti-entropy is a process to prevent bit-rot in long-lived data. Since your questions we concerning ephemeral data, we would recommend that it be disabled because there are overheads in creating and maintaining the trees that make no sense for ephemeral data.

2. What is the minimum memory requirement of an empty Riak node in this configuration?

On a sample node that I brought up, an empty Riak KV 2.2.3, the beam.smp process was using 1.5 gb of RAM with an empty memory backend and AAE-disabled.

3. What is the minimum disk and CPU requirement of a Riak node in this configuration?

There are a few variables that dictate how much actual disk throughput you will use in a Riak cluster that only uses the memory backend-logging overhead, ring changes, and cluster metadata changes.

Logging throughput is determined by the general health of the cluster and is minimal in clusters that are well-behaved. The logfiles themselves have configurable size caps and set numbers of rotations (by default 5 logs capped at 50mb for each file). There are some other logfiles that are not managed by lager and they can grow beyond these expected limits. If you are building nodes optimized for storage, you will want to monitor the size of this folder and trim it as appropriate.

The ring is a data structure that is used to hold information about the cluster’s membership, the node capabilities, MDC replication configuration, and the legacy custom bucket metadata. In stable clusters that are using no custom buckets the impact of writes to the ring is negligible; however there are certain antipatterns involving the creation of a large number of buckets with custom properties in the “default” bucket type that will bloat the ring file and result in a large amount of ring gossip.

Finally, Riak bucket types and their properties as well as the custom bucket properties of typed buckets is stored in cluster-metadata. This backend is a dets-based store that uses hashtree comparisons to maintain consistency across members of the cluster. This backend’s storage also depends on the amount and speed with which you create metadata within your cluster.

There is more generically-applicable information about [cluster capacity planning] in the Riak KV documentation.

Thanks again for your interest,

Charlie Voiselle
Sr. Product Manager, Riak KV/Clients
Basho Technologies
@angrycub




On Apr 10, 2017, at 3:30 PM, Neeraj Poddar <[hidden email]> wrote:

Hello,
 
I wanted to understand the production requirements for using Riak as a non-persistent ephemeral data store. In particular the following questions relate to using Riak with “memory” configured as storage backend:
 
1.       What is the “platform_data_dir” used for when memory is used as storage backend? Is it only needed for active anti-entropy and cluster metadata? Do I need to persist this data i.e. if a node goes down and restarts in this configuration, is persistence of data in “platform_data_dir” required.
2.       What is the minimum memory requirement of an empty Riak node in this configuration?
3.       What is the minimum disk and CPU requirement of a Riak node in this configuration?
 
-- 
Regards,
Neeraj Poddar
 
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Production deployment requirements for memory backend storage

Neeraj Poddar
Thanks Charlie for the detailed reply. It clarifies a lot of things for me. Few more follow-on questions:

1) Based on your reply for "platform_data_dir", the size of the directory is bounded for a stable cluster (i.e. not much ring ownership changes, bucket types/buckets not being created with custom properties). Newly created node joining the cluster obtains all relevant cluster/ring metadata from its peers and persists data in this directory. Is my understanding correct?

2) Is there any documentation related to memory overhead for memory storage backend? I found overhead documentation for bitcask backend but none for memory. I'm looking for overhead added by Riak per key/data pair. I'm guessing frequency of updates which might affect vector clock sizes influence this number but an average & worst case overhead numbers would be very useful. In my scenario I will be using bucket types with allow_mult set as false with last_write_wins set to false to disable siblings creation but still use vector clocks for resolving conflicts.

Thanks again!
Loading...