riak + innostore

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

riak + innostore

Lev Walkin

Hi,

We've found a performance problem with Riak 0.8 and Innostore,  
particularly on Amazon EC2 Small instances (10 nodes, n/w=3).

First, we noticed that innostore was slow accepting data:

> timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>},  
> <<"value">>]).
> {8995645,ok}
> timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>},  
> <<"value">>]).
> {4834159,ok}

Debugging showed that port_control was a culprit. Changing it to  
port_command (diff attached
) improved things considerably:

> timer:tc(innostore_riak,put,[S1, {<<"siden">>, <<"key1">>},  
> <<"value">>]
> {13899,ok}
> timer:tc(innostore_riak,get,[S1, {<<"siden">>, <<"key1">>}]).
> {86,{ok,<<"value">>}}
> timer:tc(innostore_riak,get,[S1, {<<"siden">>, <<"key">>}]).
> {90,{ok,<<"value">>}}
> timer:tc(innostore_riak,delete,[S1, {<<"siden">>, <<"key">>}]).
> {38700,ok}
> timer:tc(innostore_riak,delete,[S1, {<<"siden">>, <<"key1">>}]).
> {7299,ok}
> timer:tc(innostore_riak,get,[S1, {<<"siden">>, <<"key">>}]).
> {114,{error,notfound}}

After doing the patch, we've found an order of magnitude difference  
between calling innostore directly and using it as a riak backend.

Comparison of raw innostore vs riak@innostore.
Three types of requests were made: put, get and delete.
Each type was invoked 10000 times.

> {ok, S1} = innostore_riak:start(0, undefined).
> F = fun(0,_,_) -> ok; (N, F1, F2) -> F1(N), F2(N-1, F1, F2) end.
> C = term_to_binary(lists:duplicate(1000, $a)).
> FP = fun(N) -> innostore_riak:put(S1,{<<"siden">>,  
> term_to_binary(N)},C) end.
> FP2 = fun(N) -> innostore_riak:get(S1,{<<"siden">>,  
> term_to_binary(N)}) end.
> FP3 = fun(N) -> innostore_riak:delete(S1,{<<"siden">>,  
> term_to_binary(N)}) end.
>
> {ok, Cl} = riak:client_connect('riak@127.0.0.1').
> FP4 = fun(N) -> Cl:put(riak_object:new(<<"siden">>,  
> term_to_binary(N), C),2) end.
> FP5 = fun(N) -> Cl:get(<<"siden">>, term_to_binary(N), 2) end.
> FP6 = fun(N) -> Cl:delete(<<"siden">>, term_to_binary(N), 2) end.
>
> ------- Direct to innostore -------
> -- PUT
> io:format("~p~n",[now()]),F(10000,FP,F),io:format("~p~n",[now()]).
> {1266,508427,932468}
> {1266,508431,242659}
> -- GET
> io:format("~p~n",[now()]),F(10000,FP2,F),io:format("~p~n",[now()]).
> {1266,508515,212371}
> {1266,508516,330781}
> -- DELETE
> io:format("~p~n",[now()]),F(10000,FP3,F),io:format("~p~n",[now()]).
> {1266,508533,218732}
> {1266,508535,38505}
As you see, an order of 2 seconds per 10k invocations (5000rps).

> ------- Riak -------
> -- PUT
> io:format("~p~n",[now()]),F(10000,FP4,F),io:format("~p~n",[now()]).
> {1266,523655,774894}
> {1266,523691,812606}
> -- GET
> io:format("~p~n",[now()]),F(10000,FP5,F),io:format("~p~n",[now()]).
> {1266,523818,225468}
> {1266,523829,169635}
> -- DELETE
> io:format("~p~n",[now()]),F(10000,FP6,F),io:format("~p~n",[now()]).
> {1266,523844,402019}
> {1266,523883,160529}
Here, an order of 10-40 seconds per 10k invocations (about 250  
requests per second on a 10-node cluster).

Keys were different for each invocation. The network latency is  
negligible enough not to be the case of the problem here: a simple  
rpc:call between two nodes makes tenths of thousand requests per second.

The question is why riak adds 10x overhead to its backend?

--
vlm


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

innostore.diff (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: riak + innostore

Dave Smith
Hi Lev,

On Fri, Feb 19, 2010 at 2:03 AM, Lev Walkin <[hidden email]> wrote:

First, we noticed that innostore was slow accepting data:

timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>}, <<"value">>]).
{8995645,ok}
timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>}, <<"value">>]).
{4834159,ok}

On my own box, I verified this odd behaviour:

Orig:
timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>}, <<"value">>]).
{28429,ok}
timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>}, <<"value">>]).
{927,ok}

Patched:
timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>}, <<"value">>]).
{4237,ok}
timer:tc(innostore_riak, put, [S1, {<<"siden">>, <<"key">>}, <<"value">>]).
{726,ok}

Now, while I agree that the patched version is faster for the first pass of this particular micro-benchmark, the second pass of the test is close enough that I'm not convinced that port_command is significantly better. In general, I'm skeptical of single-data point benchmarking; it's just too easy to run into red-herrings. Did you run your own before/after test (beyond console testing) to verify your fix?

As a cross-check, I ran a 15-minute benchmark against the two versions of inno -- they are attached as inno_orig and inno_patched, respectively. As you can see, inno_orig has roughly same (and slightly better throughput) than inno_patched. However, the latencies are where things get interesting. inno_orig has an mean 95th percentile latency of 5.4 ms on GET, compared to inno_patch's 8.3 ms. The mean 95th percentile latency for PUT is even more telling: 18.0 ms for inno_orig vs. 26.1 ms for inno_patched. 

It's also worth noting that the time required for the mean/median/95th percentiles to converge on GET operations takes ~30 seconds longer on the inno_patched; this is the amount of time required to load the entire dataset in memory -- thus confirming that the total throughput is higher on inno_org.

port_control is supposed to be the fastest way, per the OTP team, to get in/out of the emulator to a port driver. Now, given that the innostore driver sends messages back, we may be losing the bulk of the speed advantage of port_control -- this is my best guess as to why we see these differences. However, as I demonstrated above, the difference in a large-scale test is negligible and port_control still comes out on top (if only from a latency standpoint). 

I would also note that if you're going to do micro-benchmarks, please don't do it on EC2 -- esp. the small instances. As I'm sure you know, micro-benchmarks are heavily influenced by the environment and virtualized platforms just introduce too much jitter to yield repeatable results. There is also a growing amount of anecdotal evidence that EC2 small instances, in particular, suffer from this problem.

After doing the patch, we've found an order of magnitude difference between calling innostore directly and using it as a riak backend.

Comparing raw calls to innostore with riak client is an invalid comparison. Riak adds a number of processing steps to every operation to provide the eventually consistent distributed storage that we all know and love. In addition, the amount of data that is actually getting stored for every Riak request is quite a bit more, maybe even 10x the size of the original data (which is so small that 10x is really not that much bigger). 

Again, I would caution against this sort of micro-benchmarking -- it's not at all representative of the performance you will see in a production environment. As the data set on a production cluster grows, you quickly become bound not by the speed of Riak, but by the I/O constraints on each box. Micro-benchmarks such as the ones you use here often lead to premature (and unnecessary) optimization. The valid window for these types of benchmarks is the first 5 minutes of runtime in your system, before you have any significant data in the system and everything still fits into cache.

All of my ranting about micro-benchmarks aside, you did uncover something interesting about the behaviour of port_control vs port_command. I'm not sure how to explain it and if you have additional evidence, I'll gladly review it.

D.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

inno_patched.png (172K) Download Attachment
inno_orig.png (172K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: riak + innostore

Lev Walkin

On Feb 19, 2010, at 8:57 AM, David Smith wrote:

> Again, I would caution against this sort of micro-benchmarking --  
> it's not at all representative of the performance you will see in a  
> production environment. As the data set on a production cluster  
> grows, you quickly become bound not by the speed of Riak, but by the  
> I/O constraints on each box. Micro-benchmarks such as the ones you  
> use here often lead to premature (and unnecessary) optimization. The  
> valid window for these types of benchmarks is the first 5 minutes of  
> runtime in your system, before you have any significant data in the  
> system and everything still fits into cache.
>
> All of my ranting about micro-benchmarks aside, you did uncover  
> something interesting about the behaviour of port_control vs  
> port_command. I'm not sure how to explain it and if you have  
> additional evidence, I'll gladly review it.


Here's an additional evidence. Without the patch attached to my  
original message, the Amazon EC2 Small boxes (we tried several) can't  
even be used to install Riak. After-install tests fail with the  
following diagnostics:

==================================
100224  0:25:36  InnoDB: highest supported file format is Barracuda.
100224  0:25:36 Embedded InnoDB 1.0.3.5325 started; log sequence  
number 62353
[1.031 s] ok
  innostore: roundtrip_test...100224  0:25:42  InnoDB: Starting  
shutdown...
100224  0:25:43  InnoDB: Shutdown completed; log sequence number 66529
*timed out*
undefined
-----------
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
100224  0:19:15 Embedded InnoDB 1.0.3.5325 started; log sequence  
number 0
[1.107 s] ok
  innostore: roundtrip_test...


too long, ^C
--------------
100224  0:25:14 Embedded InnoDB 1.0.3.5325 started; log sequence  
number 58464
[0.990 s] ok
  innostore: roundtrip_test...100224  0:25:20  InnoDB: Starting  
shutdown...
100224  0:25:20  InnoDB: Shutdown completed; log sequence number 62353
*timed out*
undefined
47:38
=================================


--
vlm


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak + innostore

Dave Smith
On Wed, Feb 24, 2010 at 2:09 AM, Lev Walkin <[hidden email]> wrote:

Here's an additional evidence. Without the patch attached to my original message, the Amazon EC2 Small boxes (we tried several) can't even be used to install Riak. After-install tests fail with the following diagnostics:

I'm a little confused here -- I thought you had discovered these performance issues on EC2? If that's true, then it must have worked at some point, no?

Also, can you provide the AMI image that you're using so I can do my own verification of it not working on EC2? I've tested innostore on m1.large (in the context of a 5 node cluster extended benchmark) and it's been fine. 

Thanks,

D.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak + innostore

Denis Titoruk
Hi David,

You can use publicly available RightScale ami-1363877a

24.02.2010, в 15:59, David Smith написал(а):

On Wed, Feb 24, 2010 at 2:09 AM, Lev Walkin <[hidden email]> wrote:

Here's an additional evidence. Without the patch attached to my original message, the Amazon EC2 Small boxes (we tried several) can't even be used to install Riak. After-install tests fail with the following diagnostics:

I'm a little confused here -- I thought you had discovered these performance issues on EC2? If that's true, then it must have worked at some point, no?

Also, can you provide the AMI image that you're using so I can do my own verification of it not working on EC2? I've tested innostore on m1.large (in the context of a 5 node cluster extended benchmark) and it's been fine. 

Thanks,

D.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: riak + innostore

Lev Walkin
In reply to this post by Dave Smith

On Feb 24, 2010, at 4:59 AM, David Smith wrote:

On Wed, Feb 24, 2010 at 2:09 AM, Lev Walkin <[hidden email]> wrote:

Here's an additional evidence. Without the patch attached to my original message, the Amazon EC2 Small boxes (we tried several) can't even be used to install Riak. After-install tests fail with the following diagnostics:

I'm a little confused here -- I thought you had discovered these performance issues on EC2? If that's true, then it must have worked at some point, no?

No, it never worked with innostore.

Also, can you provide the AMI image that you're using so I can do my own verification of it not working on EC2? I've tested innostore on m1.large (in the context of a 5 node cluster extended benchmark) and it's been fine. 

You can try out the publicly available RightScale ami-1363877a, that's what we base our images on.

-- 
vlm


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com