Leveled and Anti-Entropy

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Leveled and Anti-Entropy

Martin Sumner

I've added some anti-entropy features to Leveled (the pure-Erlang KV store designed as a Riak backend).  These features are in-part an experiment in how to approach both anti-entropy and full-sync multi-data centre replication in the future.

There's a long write-up, including some history of AAE in Riak:


In summary, Riak's current AAE is based on cryptographically strong Merkle trees, and this experiment is based on removing that security strength, as it isn't relevant to the context in which is used.  Instead Leveled now has Merkle Trees that can be merged and also can be built incrementally (i.e. be built key by key even when the keys are not in segment order).

Using these new trees (coined TicTac trees to fit into Leveled's terrible naming convention), we can build AAE trees in folds incrementally and hence at a lower cost, but also merge trees across independent stores.  In the future trees can be built from folds using Riak coverage queries, across either indexes or objects in the store - and compared between different database clusters even where those clusters are partitioned differently e.g. different ring-sizes.

The expectation is that there will be more flexibility of choice in what we can decide to compare at run time - not just are the objects consistent, are the indexes consistent.  Also split from partition constraints there will be improved flexibility in what we can decide to compare between - e.g. make it easier to compare with a different database.

Coupled with this there's a demonstration of using temporary indexes in Leveled, index entries that auto-expire at a TTL, and we've shown how this can be used with tree-creating folds to compare recent changes between stores at a lower cost than comparing the whole database state: with the added advantage that the long-term footprint of the database is not extended by maintaining a separate copy of all the keys and hashes.

Concurrently to this, we now have some other work ongoing in the space of replication and anti-entropy:

- @russelldb is continuing to test and improve his open source real-time replication solution (rabl) which uses RabbitMQ.  He's hoping to be able to talk further on progress with this by the end of August. 
- I'm working on implementing in riak_core a core_node_worker_pool, which is intended to compliment the core_vnode_worker_pool but allow for coverage queries where snapshots are taken on a covering set of vnodes, but folds are then scheduled to run one-at-a-time on each node.  This can then be used to regulate the impact of anti-entropy folds.

Our current target is to have a release candidate of open-source replication (both real-time and full-sync) by the end of September.  This will initially be focused only on replication between two Riak clusters. 

Regards

Martin (@masleeds)

P.S. Hopefully next Friday we should also be able to report back on the improvements and test enhancements that followed up the work on riak_core claim.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Leveled and Anti-Entropy

Heinz N. Gies
Have you taken a look at the changes here https://github.com/Kyorai/riak_core/pull/24

It pulls the AAE work for riak_kv into riak_core.


On 21. Jul 2017, at 16:06, Martin Sumner <[hidden email]> wrote:


I've added some anti-entropy features to Leveled (the pure-Erlang KV store designed as a Riak backend).  These features are in-part an experiment in how to approach both anti-entropy and full-sync multi-data centre replication in the future.

There's a long write-up, including some history of AAE in Riak:


In summary, Riak's current AAE is based on cryptographically strong Merkle trees, and this experiment is based on removing that security strength, as it isn't relevant to the context in which is used.  Instead Leveled now has Merkle Trees that can be merged and also can be built incrementally (i.e. be built key by key even when the keys are not in segment order).

Using these new trees (coined TicTac trees to fit into Leveled's terrible naming convention), we can build AAE trees in folds incrementally and hence at a lower cost, but also merge trees across independent stores.  In the future trees can be built from folds using Riak coverage queries, across either indexes or objects in the store - and compared between different database clusters even where those clusters are partitioned differently e.g. different ring-sizes.

The expectation is that there will be more flexibility of choice in what we can decide to compare at run time - not just are the objects consistent, are the indexes consistent.  Also split from partition constraints there will be improved flexibility in what we can decide to compare between - e.g. make it easier to compare with a different database.

Coupled with this there's a demonstration of using temporary indexes in Leveled, index entries that auto-expire at a TTL, and we've shown how this can be used with tree-creating folds to compare recent changes between stores at a lower cost than comparing the whole database state: with the added advantage that the long-term footprint of the database is not extended by maintaining a separate copy of all the keys and hashes.

Concurrently to this, we now have some other work ongoing in the space of replication and anti-entropy:

- @russelldb is continuing to test and improve his open source real-time replication solution (rabl) which uses RabbitMQ.  He's hoping to be able to talk further on progress with this by the end of August. 
- I'm working on implementing in riak_core a core_node_worker_pool, which is intended to compliment the core_vnode_worker_pool but allow for coverage queries where snapshots are taken on a covering set of vnodes, but folds are then scheduled to run one-at-a-time on each node.  This can then be used to regulate the impact of anti-entropy folds.

Our current target is to have a release candidate of open-source replication (both real-time and full-sync) by the end of September.  This will initially be focused only on replication between two Riak clusters. 

Regards

Martin (@masleeds)

P.S. Hopefully next Friday we should also be able to report back on the improvements and test enhancements that followed up the work on riak_core claim.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Leveled and Anti-Entropy

Martin Sumner
Heinz,

No I haven't, but I will.

Thanks

Martin

On 21 July 2017 at 15:43, Heinz N. Gies <[hidden email]> wrote:
Have you taken a look at the changes here https://github.com/Kyorai/riak_core/pull/24

It pulls the AAE work for riak_kv into riak_core.


On 21. Jul 2017, at 16:06, Martin Sumner <[hidden email]> wrote:


I've added some anti-entropy features to Leveled (the pure-Erlang KV store designed as a Riak backend).  These features are in-part an experiment in how to approach both anti-entropy and full-sync multi-data centre replication in the future.

There's a long write-up, including some history of AAE in Riak:


In summary, Riak's current AAE is based on cryptographically strong Merkle trees, and this experiment is based on removing that security strength, as it isn't relevant to the context in which is used.  Instead Leveled now has Merkle Trees that can be merged and also can be built incrementally (i.e. be built key by key even when the keys are not in segment order).

Using these new trees (coined TicTac trees to fit into Leveled's terrible naming convention), we can build AAE trees in folds incrementally and hence at a lower cost, but also merge trees across independent stores.  In the future trees can be built from folds using Riak coverage queries, across either indexes or objects in the store - and compared between different database clusters even where those clusters are partitioned differently e.g. different ring-sizes.

The expectation is that there will be more flexibility of choice in what we can decide to compare at run time - not just are the objects consistent, are the indexes consistent.  Also split from partition constraints there will be improved flexibility in what we can decide to compare between - e.g. make it easier to compare with a different database.

Coupled with this there's a demonstration of using temporary indexes in Leveled, index entries that auto-expire at a TTL, and we've shown how this can be used with tree-creating folds to compare recent changes between stores at a lower cost than comparing the whole database state: with the added advantage that the long-term footprint of the database is not extended by maintaining a separate copy of all the keys and hashes.

Concurrently to this, we now have some other work ongoing in the space of replication and anti-entropy:

- @russelldb is continuing to test and improve his open source real-time replication solution (rabl) which uses RabbitMQ.  He's hoping to be able to talk further on progress with this by the end of August. 
- I'm working on implementing in riak_core a core_node_worker_pool, which is intended to compliment the core_vnode_worker_pool but allow for coverage queries where snapshots are taken on a covering set of vnodes, but folds are then scheduled to run one-at-a-time on each node.  This can then be used to regulate the impact of anti-entropy folds.

Our current target is to have a release candidate of open-source replication (both real-time and full-sync) by the end of September.  This will initially be focused only on replication between two Riak clusters. 

Regards

Martin (@masleeds)

P.S. Hopefully next Friday we should also be able to report back on the improvements and test enhancements that followed up the work on riak_core claim.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com