LevelDB read performance

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

LevelDB read performance

Parnell Springmeyer
I'm using Riak in a 5 node cluster with LevelDB for the backend (we store A LOT of archivable data) on FreeBSD.

The data is mapped out as follows: I have a set of database objects that are closely linked to user accounts - I needed to be able to compose complex queries on these objects including joins with the user data; so it made sense to keep those objects in MySQL.

I have software that takes those database objects and produces DAILY stats for each object (so we have months/years of data for each database object). These stats are what we store in Riak.

Now, that application also updates the MySQL database object with the key under which the stat object is stored in Riak for quick and easy compiling of the "latest" data (since it's just a GET operation and not a M/R job).

Mashing up this data for small sets of MySQL database objects is quick and painless. But once it starts approaching > 1000 objects I notice it slows to a crawl and i notice Riak being pegged pretty hard (IOW it is Riak's response).

Now; here's the issue: with my web application I haven't figured out how to use the RiakPBC connector - so we are going through the HTTP API. I have a feeling this is where that bottle neck is occurring.

Why do you ask? Because our Python web app is multi-threaded and the PBC sockets don't play nice here. I'm not finding my experiments to solve this very successful. So I wanted to ask the greater community if anyone HAS or is willing to HELP me solve it!

Thanks :)
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

signature.asc (858 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LevelDB read performance

Parnell Springmeyer
I guess I got off track from my original subject line - once I started writing I realized read performance wasn't a LevelDB issue (I originally thought that maybe it was) but that the bottleneck must be our utilization of the API...

On Jul 26, 2012, at 5:18 PM, Parnell Springmeyer wrote:

> I'm using Riak in a 5 node cluster with LevelDB for the backend (we store A LOT of archivable data) on FreeBSD.
>
> The data is mapped out as follows: I have a set of database objects that are closely linked to user accounts - I needed to be able to compose complex queries on these objects including joins with the user data; so it made sense to keep those objects in MySQL.
>
> I have software that takes those database objects and produces DAILY stats for each object (so we have months/years of data for each database object). These stats are what we store in Riak.
>
> Now, that application also updates the MySQL database object with the key under which the stat object is stored in Riak for quick and easy compiling of the "latest" data (since it's just a GET operation and not a M/R job).
>
> Mashing up this data for small sets of MySQL database objects is quick and painless. But once it starts approaching > 1000 objects I notice it slows to a crawl and i notice Riak being pegged pretty hard (IOW it is Riak's response).
>
> Now; here's the issue: with my web application I haven't figured out how to use the RiakPBC connector - so we are going through the HTTP API. I have a feeling this is where that bottle neck is occurring.
>
> Why do you ask? Because our Python web app is multi-threaded and the PBC sockets don't play nice here. I'm not finding my experiments to solve this very successful. So I wanted to ask the greater community if anyone HAS or is willing to HELP me solve it!
>
> Thanks :)

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

signature.asc (858 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LevelDB read performance

Dan Reverri
Hi Parnell,

Can you explain a bit more regarding "approaching > 1000 objects"? Are you seeing high latency reads for single Riak objects as the object size grows? How large are the Riak objects and how much of a latency spike are you seeing?

Thanks,
Dan

--
Daniel Reverri
Client Architect
Basho Technologies, Inc.
[hidden email]


July 26, 2012 3:29 PM
I guess I got off track from my original subject line - once I started writing I realized read performance wasn't a LevelDB issue (I originally thought that maybe it was) but that the bottleneck must be our utilization of the API...


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
July 26, 2012 3:18 PM
I'm using Riak in a 5 node cluster with LevelDB for the backend (we store A LOT of archivable data) on FreeBSD.

The data is mapped out as follows: I have a set of database objects that are closely linked to user accounts - I needed to be able to compose complex queries on these objects including joins with the user data; so it made sense to keep those objects in MySQL.

I have software that takes those database objects and produces DAILY stats for each object (so we have months/years of data for each database object). These stats are what we store in Riak.

Now, that application also updates the MySQL database object with the key under which the stat object is stored in Riak for quick and easy compiling of the "latest" data (since it's just a GET operation and not a M/R job).

Mashing up this data for small sets of MySQL database objects is quick and painless. But once it starts approaching > 1000 objects I notice it slows to a crawl and i notice Riak being pegged pretty hard (IOW it is Riak's response).

Now; here's the issue: with my web application I haven't figured out how to use the RiakPBC connector - so we are going through the HTTP API. I have a feeling this is where that bottle neck is occurring.

Why do you ask? Because our Python web app is multi-threaded and the PBC sockets don't play nice here. I'm not finding my experiments to solve this very successful. So I wanted to ask the greater community if anyone HAS or is willing to HELP me solve it!

Thanks :)
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: LevelDB read performance

John D. Rowell
In reply to this post by Parnell Springmeyer
Why not push the data (or references to it) to a queue (e.g. RabbitMQ) and then run single-threaded consumers that work well with PBC? That would also decouple the processes and allow you to scale them independently.

-jd

2012/7/26 Parnell Springmeyer <[hidden email]>
I'm using Riak in a 5 node cluster with LevelDB for the backend (we store A LOT of archivable data) on FreeBSD.

The data is mapped out as follows: I have a set of database objects that are closely linked to user accounts - I needed to be able to compose complex queries on these objects including joins with the user data; so it made sense to keep those objects in MySQL.

I have software that takes those database objects and produces DAILY stats for each object (so we have months/years of data for each database object). These stats are what we store in Riak.

Now, that application also updates the MySQL database object with the key under which the stat object is stored in Riak for quick and easy compiling of the "latest" data (since it's just a GET operation and not a M/R job).

Mashing up this data for small sets of MySQL database objects is quick and painless. But once it starts approaching > 1000 objects I notice it slows to a crawl and i notice Riak being pegged pretty hard (IOW it is Riak's response).

Now; here's the issue: with my web application I haven't figured out how to use the RiakPBC connector - so we are going through the HTTP API. I have a feeling this is where that bottle neck is occurring.

Why do you ask? Because our Python web app is multi-threaded and the PBC sockets don't play nice here. I'm not finding my experiments to solve this very successful. So I wanted to ask the greater community if anyone HAS or is willing to HELP me solve it!

Thanks :)
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: LevelDB read performance

Parnell Springmeyer
In reply to this post by Dan Reverri
So, if I have 1,000 MySQL objects that are owned by a single user; there are roughly 1,000 stat result objects. So if I loop over those 1,000 objects and do a get(object.result_key) GET from Riak.

The objects don't grow as I don't update them; they are partitioned by buckets and the object always stays constant in size. The objects are really small, tiny even.

I'm seeing it take about 3-4 minutes to finish loading a result set approaching the above scenario. I would imagine it should be fast; but this is why I realize it is probably my use of the HTTP API over PBC…

On Jul 26, 2012, at 5:33 PM, Daniel Reverri wrote:

Hi Parnell,

Can you explain a bit more regarding "approaching > 1000 objects"? Are you seeing high latency reads for single Riak objects as the object size grows? How large are the Riak objects and how much of a latency spike are you seeing?

Thanks,
Dan

--
Daniel Reverri
Client Architect
Basho Technologies, Inc.
[hidden email]


July 26, 2012 3:29 PM
I guess I got off track from my original subject line - once I started writing I realized read performance wasn't a LevelDB issue (I originally thought that maybe it was) but that the bottleneck must be our utilization of the API...


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
July 26, 2012 3:18 PM
I'm using Riak in a 5 node cluster with LevelDB for the backend (we store A LOT of archivable data) on FreeBSD.

The data is mapped out as follows: I have a set of database objects that are closely linked to user accounts - I needed to be able to compose complex queries on these objects including joins with the user data; so it made sense to keep those objects in MySQL.

I have software that takes those database objects and produces DAILY stats for each object (so we have months/years of data for each database object). These stats are what we store in Riak.

Now, that application also updates the MySQL database object with the key under which the stat object is stored in Riak for quick and easy compiling of the "latest" data (since it's just a GET operation and not a M/R job).

Mashing up this data for small sets of MySQL database objects is quick and painless. But once it starts approaching > 1000 objects I notice it slows to a crawl and i notice Riak being pegged pretty hard (IOW it is Riak's response).

Now; here's the issue: with my web application I haven't figured out how to use the RiakPBC connector - so we are going through the HTTP API. I have a feeling this is where that bottle neck is occurring.

Why do you ask? Because our Python web app is multi-threaded and the PBC sockets don't play nice here. I'm not finding my experiments to solve this very successful. So I wanted to ask the greater community if anyone HAS or is willing to HELP me solve it!

Thanks :)
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

signature.asc (858 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LevelDB read performance

Parnell Springmeyer
In reply to this post by John D. Rowell
John, I don't really see a scenario where that would be a useful solution - that's a lot of overhead to add a message queuing system into the mix when my client should be able to handle the connection(s) just fine.

I think the primary issue comes from the Python client not pooling Riak connections.

On Jul 26, 2012, at 5:39 PM, John D. Rowell wrote:

Why not push the data (or references to it) to a queue (e.g. RabbitMQ) and then run single-threaded consumers that work well with PBC? That would also decouple the processes and allow you to scale them independently.

-jd

2012/7/26 Parnell Springmeyer <[hidden email]>
I'm using Riak in a 5 node cluster with LevelDB for the backend (we store A LOT of archivable data) on FreeBSD.

The data is mapped out as follows: I have a set of database objects that are closely linked to user accounts - I needed to be able to compose complex queries on these objects including joins with the user data; so it made sense to keep those objects in MySQL.

I have software that takes those database objects and produces DAILY stats for each object (so we have months/years of data for each database object). These stats are what we store in Riak.

Now, that application also updates the MySQL database object with the key under which the stat object is stored in Riak for quick and easy compiling of the "latest" data (since it's just a GET operation and not a M/R job).

Mashing up this data for small sets of MySQL database objects is quick and painless. But once it starts approaching > 1000 objects I notice it slows to a crawl and i notice Riak being pegged pretty hard (IOW it is Riak's response).

Now; here's the issue: with my web application I haven't figured out how to use the RiakPBC connector - so we are going through the HTTP API. I have a feeling this is where that bottle neck is occurring.

Why do you ask? Because our Python web app is multi-threaded and the PBC sockets don't play nice here. I'm not finding my experiments to solve this very successful. So I wanted to ask the greater community if anyone HAS or is willing to HELP me solve it!

Thanks :)
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

signature.asc (858 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LevelDB read performance

Parnell Springmeyer
So I have made some headway on figuring this out - it's not HTTP API performance, it turned out to be some of the really old records that did not have brand new records.

I'm aware of the penalty incurred when you try to request a key that doesn't exist - so I'm first going to try and see if I have any major logic issues along those lines.

Otherwise, is it possible for old records to incur performance penalties due to seeking down into a second level? I think I need to read up on LevelDB some more to figure out all of the performance scenarios.

On Jul 26, 2012, at 6:08 PM, Parnell Springmeyer wrote:

John, I don't really see a scenario where that would be a useful solution - that's a lot of overhead to add a message queuing system into the mix when my client should be able to handle the connection(s) just fine.

I think the primary issue comes from the Python client not pooling Riak connections.

On Jul 26, 2012, at 5:39 PM, John D. Rowell wrote:

Why not push the data (or references to it) to a queue (e.g. RabbitMQ) and then run single-threaded consumers that work well with PBC? That would also decouple the processes and allow you to scale them independently.

-jd

2012/7/26 Parnell Springmeyer <[hidden email]>
I'm using Riak in a 5 node cluster with LevelDB for the backend (we store A LOT of archivable data) on FreeBSD.

The data is mapped out as follows: I have a set of database objects that are closely linked to user accounts - I needed to be able to compose complex queries on these objects including joins with the user data; so it made sense to keep those objects in MySQL.

I have software that takes those database objects and produces DAILY stats for each object (so we have months/years of data for each database object). These stats are what we store in Riak.

Now, that application also updates the MySQL database object with the key under which the stat object is stored in Riak for quick and easy compiling of the "latest" data (since it's just a GET operation and not a M/R job).

Mashing up this data for small sets of MySQL database objects is quick and painless. But once it starts approaching > 1000 objects I notice it slows to a crawl and i notice Riak being pegged pretty hard (IOW it is Riak's response).

Now; here's the issue: with my web application I haven't figured out how to use the RiakPBC connector - so we are going through the HTTP API. I have a feeling this is where that bottle neck is occurring.

Why do you ask? Because our Python web app is multi-threaded and the PBC sockets don't play nice here. I'm not finding my experiments to solve this very successful. So I wanted to ask the greater community if anyone HAS or is willing to HELP me solve it!

Thanks :)
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

signature.asc (858 bytes) Download Attachment