Leveled - Another Erlang Key-Value store

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Leveled - Another Erlang Key-Value store

Martin Sumner

Over the past few months I've been working on an alternative pure-Erlang Key/Value store to act as a backend to Riak KV.  This is now open source and available at 


The store is a work-in-progress prototype, originally started to better understand the impact of different trade-offs in LSM-Tree design.  The aim is to:

- provide a fully-featured Riak backend (e.g. secondary index, object expiry support etc)
- provide stable throughput with larger object sizes (> 4KB)
- provide a simpler and more flexible path to making Riak KV changes end-to-end

The prime change in the store when compared to HanoiDB or eleveldb is that storage is split with only Keys & Metadata being placed in the merge tree, and the full object living to the side in a series of CDB-based journals.  The intention of this is to:

- reduce write amplification and ease page-cache pollution issues on scanning events.
- support faster HEAD requests than GET requests, and in parallel an alternative Riak KV branch has been produced to move from an n-GET model to a n-HEAD 1-GET model of fetching data for both KV GET and KV PUT operations

The impact of this has been to improve throughput for larger object sizes where disk I/O and not CPU is the current limit on throughput.  The advantage increases the greater the object size, and the tighter the constraint on disk.  

Please visit the github page, I've tried to write up as much about the project as I can.  There's the results of various volume tests, information on the research which prompted the design, an overview of the design itself and some hints as to what I expect to try next with leveled.

Any feedback, please mail me, raise an issue on github, or ping me @masleeds

Cheers

Martin

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Leveled - Another Erlang Key-Value store

DeadZen
Cheers indeed!
You added HEAD requests so a full GET wouldn't always be required?
Did I read that right? *dives into code*
%% GET requests first follow the path of a HEAD request, and if an object is
%% found, then fetch the value from the Journal via the Inker.
... WHAT?

Very nice work, will be more than happy to provide feedback and patches on this.

On Tue, Feb 28, 2017 at 9:04 AM, Martin Sumner
<[hidden email]> wrote:

>
> Over the past few months I've been working on an alternative pure-Erlang
> Key/Value store to act as a backend to Riak KV.  This is now open source and
> available at
>
> https://github.com/martinsumner/leveled
>
> The store is a work-in-progress prototype, originally started to better
> understand the impact of different trade-offs in LSM-Tree design.  The aim
> is to:
>
> - provide a fully-featured Riak backend (e.g. secondary index, object expiry
> support etc)
> - provide stable throughput with larger object sizes (> 4KB)
> - provide a simpler and more flexible path to making Riak KV changes
> end-to-end
>
> The prime change in the store when compared to HanoiDB or eleveldb is that
> storage is split with only Keys & Metadata being placed in the merge tree,
> and the full object living to the side in a series of CDB-based journals.
> The intention of this is to:
>
> - reduce write amplification and ease page-cache pollution issues on
> scanning events.
> - support faster HEAD requests than GET requests, and in parallel an
> alternative Riak KV branch has been produced to move from an n-GET model to
> a n-HEAD 1-GET model of fetching data for both KV GET and KV PUT operations
>
> The impact of this has been to improve throughput for larger object sizes
> where disk I/O and not CPU is the current limit on throughput.  The
> advantage increases the greater the object size, and the tighter the
> constraint on disk.
>
> Please visit the github page, I've tried to write up as much about the
> project as I can.  There's the results of various volume tests, information
> on the research which prompted the design, an overview of the design itself
> and some hints as to what I expect to try next with leveled.
>
> Any feedback, please mail me, raise an issue on github, or ping me @masleeds
>
> Cheers
>
> Martin
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Leveled - Another Erlang Key-Value store

Martin Sumner
Have a look in https://github.com/martinsumner/leveled/blob/master/docs/FUTURE.md and the "Riak Features Implemented"

I'm trying to avoid the need for Riak to Get n times for every request (both a KV GET and a KV PUT), when really it only needs the body once, and the n-1 times it doesn't need the body if the HEAD response confirms that the vector clock is at the expected state.  As values get larger there's a lot of unnecessary disk activity in those GETs.

To understand all the Journal/Inker stuff have a look at https://github.com/martinsumner/leveled/blob/master/docs/DESIGN.md

The Inker is an actor that controls the Journal - the Journal is a transaction log where all the values are stored permanently, made up of a series of CDB files where everything is ordered by sequence number.  The Keys and Metadata are stored in a merge tree called the Ledger - and the Ledger is controlled by an actor called the Penciller.  So the HEAD request should just need to check in the Ledger via the Penciller.  A GET request must follow that same path, and then once it has the metadata, it has the sequence number and so can use that to fetch the value from the Journal via the Inker.

Does this make sense?  

My apologies, as the naming may not be helpful.  Perhaps one of the drawbacks of working in isolation on this.

Thanks

Martin




On 28 February 2017 at 16:51, DeadZen <[hidden email]> wrote:
Cheers indeed!
You added HEAD requests so a full GET wouldn't always be required?
Did I read that right? *dives into code*
%% GET requests first follow the path of a HEAD request, and if an object is
%% found, then fetch the value from the Journal via the Inker.
... WHAT?

Very nice work, will be more than happy to provide feedback and patches on this.

On Tue, Feb 28, 2017 at 9:04 AM, Martin Sumner
<[hidden email]> wrote:
>
> Over the past few months I've been working on an alternative pure-Erlang
> Key/Value store to act as a backend to Riak KV.  This is now open source and
> available at
>
> https://github.com/martinsumner/leveled
>
> The store is a work-in-progress prototype, originally started to better
> understand the impact of different trade-offs in LSM-Tree design.  The aim
> is to:
>
> - provide a fully-featured Riak backend (e.g. secondary index, object expiry
> support etc)
> - provide stable throughput with larger object sizes (> 4KB)
> - provide a simpler and more flexible path to making Riak KV changes
> end-to-end
>
> The prime change in the store when compared to HanoiDB or eleveldb is that
> storage is split with only Keys & Metadata being placed in the merge tree,
> and the full object living to the side in a series of CDB-based journals.
> The intention of this is to:
>
> - reduce write amplification and ease page-cache pollution issues on
> scanning events.
> - support faster HEAD requests than GET requests, and in parallel an
> alternative Riak KV branch has been produced to move from an n-GET model to
> a n-HEAD 1-GET model of fetching data for both KV GET and KV PUT operations
>
> The impact of this has been to improve throughput for larger object sizes
> where disk I/O and not CPU is the current limit on throughput.  The
> advantage increases the greater the object size, and the tighter the
> constraint on disk.
>
> Please visit the github page, I've tried to write up as much about the
> project as I can.  There's the results of various volume tests, information
> on the research which prompted the design, an overview of the design itself
> and some hints as to what I expect to try next with leveled.
>
> Any feedback, please mail me, raise an issue on github, or ping me @masleeds
>
> Cheers
>
> Martin
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Leveled - Another Erlang Key-Value store

DeadZen
Yup I get it, I like the concept and the fun naming of inkers, pencillers and clerks as well. Is there a basho bench configuration? This reminds me a bit if fractal trees but with a focus on NoSQL operational semantics, how else do you see this adding improvements? It seems index requests could be cheaper with this backend configuration. 

On Tue, Feb 28, 2017 at 12:07 PM Martin Sumner <[hidden email]> wrote:
Have a look in https://github.com/martinsumner/leveled/blob/master/docs/FUTURE.md and the "Riak Features Implemented"

I'm trying to avoid the need for Riak to Get n times for every request (both a KV GET and a KV PUT), when really it only needs the body once, and the n-1 times it doesn't need the body if the HEAD response confirms that the vector clock is at the expected state.  As values get larger there's a lot of unnecessary disk activity in those GETs.

To understand all the Journal/Inker stuff have a look at https://github.com/martinsumner/leveled/blob/master/docs/DESIGN.md

The Inker is an actor that controls the Journal - the Journal is a transaction log where all the values are stored permanently, made up of a series of CDB files where everything is ordered by sequence number.  The Keys and Metadata are stored in a merge tree called the Ledger - and the Ledger is controlled by an actor called the Penciller.  So the HEAD request should just need to check in the Ledger via the Penciller.  A GET request must follow that same path, and then once it has the metadata, it has the sequence number and so can use that to fetch the value from the Journal via the Inker.

Does this make sense?  

My apologies, as the naming may not be helpful.  Perhaps one of the drawbacks of working in isolation on this.

Thanks

Martin




On 28 February 2017 at 16:51, DeadZen <[hidden email]> wrote:
Cheers indeed!
You added HEAD requests so a full GET wouldn't always be required?
Did I read that right? *dives into code*
%% GET requests first follow the path of a HEAD request, and if an object is
%% found, then fetch the value from the Journal via the Inker.
... WHAT?

Very nice work, will be more than happy to provide feedback and patches on this.

On Tue, Feb 28, 2017 at 9:04 AM, Martin Sumner
<[hidden email]> wrote:
>
> Over the past few months I've been working on an alternative pure-Erlang
> Key/Value store to act as a backend to Riak KV.  This is now open source and
> available at
>
> https://github.com/martinsumner/leveled
>
> The store is a work-in-progress prototype, originally started to better
> understand the impact of different trade-offs in LSM-Tree design.  The aim
> is to:
>
> - provide a fully-featured Riak backend (e.g. secondary index, object expiry
> support etc)
> - provide stable throughput with larger object sizes (> 4KB)
> - provide a simpler and more flexible path to making Riak KV changes
> end-to-end
>
> The prime change in the store when compared to HanoiDB or eleveldb is that
> storage is split with only Keys & Metadata being placed in the merge tree,
> and the full object living to the side in a series of CDB-based journals.
> The intention of this is to:
>
> - reduce write amplification and ease page-cache pollution issues on
> scanning events.
> - support faster HEAD requests than GET requests, and in parallel an
> alternative Riak KV branch has been produced to move from an n-GET model to
> a n-HEAD 1-GET model of fetching data for both KV GET and KV PUT operations
>
> The impact of this has been to improve throughput for larger object sizes
> where disk I/O and not CPU is the current limit on throughput.  The
> advantage increases the greater the object size, and the tighter the
> constraint on disk.
>
> Please visit the github page, I've tried to write up as much about the
> project as I can.  There's the results of various volume tests, information
> on the research which prompted the design, an overview of the design itself
> and some hints as to what I expect to try next with leveled.
>
> Any feedback, please mail me, raise an issue on github, or ping me @masleeds
>
> Cheers
>
> Martin
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Leveled - Another Erlang Key-Value store

Martin Sumner
For original testing in isolation I used this:


But I've focused on testing since within Riak using the standard basho_bench/examples/riakc_pb.config test

Testing of secondary indexing is one of the next big things for me, definitely going to do this in March.  From a development perspective doing clones/snapshots was a lot easier than it presumably is in C++.  I'm unsure about how performance will compare, as 2i terms do end up clumped together on disk in leveldb and I know MVM has done a lot of optimisation work in that area.

Some things that may help leveled are: there's no overlapping files in level 1; the levels don't increase in depth as rapidly.  However, there are some issues (https://github.com/martinsumner/leveled/issues/34) that I may still need to contend with.

I think there may be some interesting possibilities with Map functions based on HEAD not GET requests.  So perhaps as a developer I could add a bitmap index to my object metadata, and roll across those bitmaps efficiently through a Map function now that I wouldn't need to pull the whole object off disk to achieve that. So the concept of division between Metadata and Value may open up efficient query ideas beyond 2i


On 28 February 2017 at 17:14, DeadZen <[hidden email]> wrote:
Yup I get it, I like the concept and the fun naming of inkers, pencillers and clerks as well. Is there a basho bench configuration? This reminds me a bit if fractal trees but with a focus on NoSQL operational semantics, how else do you see this adding improvements? It seems index requests could be cheaper with this backend configuration. 

On Tue, Feb 28, 2017 at 12:07 PM Martin Sumner <[hidden email]> wrote:
Have a look in https://github.com/martinsumner/leveled/blob/master/docs/FUTURE.md and the "Riak Features Implemented"

I'm trying to avoid the need for Riak to Get n times for every request (both a KV GET and a KV PUT), when really it only needs the body once, and the n-1 times it doesn't need the body if the HEAD response confirms that the vector clock is at the expected state.  As values get larger there's a lot of unnecessary disk activity in those GETs.

To understand all the Journal/Inker stuff have a look at https://github.com/martinsumner/leveled/blob/master/docs/DESIGN.md

The Inker is an actor that controls the Journal - the Journal is a transaction log where all the values are stored permanently, made up of a series of CDB files where everything is ordered by sequence number.  The Keys and Metadata are stored in a merge tree called the Ledger - and the Ledger is controlled by an actor called the Penciller.  So the HEAD request should just need to check in the Ledger via the Penciller.  A GET request must follow that same path, and then once it has the metadata, it has the sequence number and so can use that to fetch the value from the Journal via the Inker.

Does this make sense?  

My apologies, as the naming may not be helpful.  Perhaps one of the drawbacks of working in isolation on this.

Thanks

Martin




On 28 February 2017 at 16:51, DeadZen <[hidden email]> wrote:
Cheers indeed!
You added HEAD requests so a full GET wouldn't always be required?
Did I read that right? *dives into code*
%% GET requests first follow the path of a HEAD request, and if an object is
%% found, then fetch the value from the Journal via the Inker.
... WHAT?

Very nice work, will be more than happy to provide feedback and patches on this.

On Tue, Feb 28, 2017 at 9:04 AM, Martin Sumner
<[hidden email]> wrote:
>
> Over the past few months I've been working on an alternative pure-Erlang
> Key/Value store to act as a backend to Riak KV.  This is now open source and
> available at
>
> https://github.com/martinsumner/leveled
>
> The store is a work-in-progress prototype, originally started to better
> understand the impact of different trade-offs in LSM-Tree design.  The aim
> is to:
>
> - provide a fully-featured Riak backend (e.g. secondary index, object expiry
> support etc)
> - provide stable throughput with larger object sizes (> 4KB)
> - provide a simpler and more flexible path to making Riak KV changes
> end-to-end
>
> The prime change in the store when compared to HanoiDB or eleveldb is that
> storage is split with only Keys & Metadata being placed in the merge tree,
> and the full object living to the side in a series of CDB-based journals.
> The intention of this is to:
>
> - reduce write amplification and ease page-cache pollution issues on
> scanning events.
> - support faster HEAD requests than GET requests, and in parallel an
> alternative Riak KV branch has been produced to move from an n-GET model to
> a n-HEAD 1-GET model of fetching data for both KV GET and KV PUT operations
>
> The impact of this has been to improve throughput for larger object sizes
> where disk I/O and not CPU is the current limit on throughput.  The
> advantage increases the greater the object size, and the tighter the
> constraint on disk.
>
> Please visit the github page, I've tried to write up as much about the
> project as I can.  There's the results of various volume tests, information
> on the research which prompted the design, an overview of the design itself
> and some hints as to what I expect to try next with leveled.
>
> Any feedback, please mail me, raise an issue on github, or ping me @masleeds
>
> Cheers
>
> Martin
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com