Riak getting very slow

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Riak getting very slow

amol.zambare@bookmypacket.com
This post was updated on .
Hi All,

We are running riak kv 2.0.1 on 5 node, all are high end conf i.e it does not have any load. All nodes have solr on.

Still, We getting very high latency

After Some investigation, i have found what will be a possible issue,
We have one bucket with solr index, solr index's each document has about 100+ dynamic fields in the Solr schema

I have read two issue related to the same problem as below
https://github.com/basho/yokozuna/issues/719
https://github.com/basho/yokozuna/issues/330

This specify that you should not have more than 60 dynamic fields else riak will get slow because of solr index creation will be very slow

Below is riak-admin status related to solr
rings_reconciled_total : 80
search_index_fail_count : 1011
search_index_fail_one : 5
search_index_latency_95 : 36450099
search_index_latency_99 : 54188877
search_index_latency_999 : 54188877
search_index_latency_max : 54188877
search_index_latency_mean : 15818891
search_index_latency_median : 17226576
search_index_latency_min : 1919
search_index_throughput_count : 36125
search_index_throughput_one : 19
search_query_fail_count : 29
search_query_fail_one : 0
search_query_latency_95 : 0
search_query_latency_99 : 0
search_query_latency_999 : 0
search_query_latency_max : 0
search_query_latency_mean : 0
search_query_latency_median : 0
search_query_latency_min : 0
search_query_throughput_count : 3455

Also related to port time waiting as below
netstat -anp | grep :8093 | grep EST | wc -l
20
netstat -anp | grep :8093 | grep TIME_WAIT | wc -l
21

Please help us find out issue and what will be possible solution

Thanks,
Amol
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Riak getting very slow

Fred Dushin-3
When you say "All not have solr on", do you mean not all nodes have search enabled?   If you are measuring Solr index latencies, then you definitely have Solr on at least one node.  Or is this just a typo?

Going on the assumption you have search enabled on all nodes (you should, if you are using search at all), you are seeing mean latencies on puts on the order of almost 16 seconds, and 99th percentile latencies reaching almost a minute.  Yes, that is slow!

Do you have any other metrics on what is going on with the Solr process?  It is a separate VM, and in general you can probe it using JMX, or even by scraping the proc file system.  I don't have anything handy out of the box, but I have written some collected python modules you should feel free to use and pilfer, if that helps:

https://github.com/fadushin/riak_puppet_stuff/tree/master/modules/riak_node/files/collectd

I am not enough of a Solr expert to say that having 100+ dynamic fields is the root cause of your issues with write latency.  It could be, so you should try to see if Solr is impacted if you write to a different Riak search index (i.e., Solr core).  That would of course require a new bucket, but those are cheap for the purposes of experimentation.  You may need to re-architect your application to use more Riak indices and bucket types, and to use statically defined Solr fields, in order to get over this hump.

Another thing you should consider doing is upgrading to Riak 2.0.9.  This includes very significant improvement to the write/index path into Solr, with support for batching and asynchronous delivery into Solr.  This won't necessarily fix your problem -- you should get to the bottom of why you are getting 16 seconds average write latencies into Solr for a single Solr document first, but it may give you some headroom in the future.

One other thing we have found leading up to the 2.0.8 release, and which was fixed in 2.0.8 and later, is that Solr does slow to a creep if you have a high number of siblings, almost linearly in the number of siblings.  This happens because Riak used to use the deleteByQuery Solr operation when indexing a document, which would cause Solr memory consumption to go through the roof, as well as CPU utilization.  We fixed this in 2.0.8 and later to delete previously existing documents by id, which is far less resource consumptive on the Solr side.  Do you have a handle on how many siblings you have in your Riak objects?

And BTW, if you upgrade to Riak 2.2.0, then you will also get an upgrade to Solr 4.10.

-Fred

> On Jun 16, 2017, at 4:49 PM, [hidden email] wrote:
>
> Hi All,
>
> We are running riak kv 2.0.1 on 5 node, all are high end conf i.e it does
> not have any load. All not have solr on.
>
> Still, We getting very high latency
>
> After Some investigation, i have found what will be a possible issue,
> We have one bucket with solr index, solr index's each document has about
> 100+ dynamic fields in the Solr schema
>
> I have read two issue related to the same problem as below
> https://github.com/basho/yokozuna/issues/719
> https://github.com/basho/yokozuna/issues/330
>
> This specify that you should not have more than 60 dynamic fields else riak
> will get slow because of solr index creation will be very slow
>
> Below is riak-admin status related to solr
> rings_reconciled_total : 80
> search_index_fail_count : 1011
> search_index_fail_one : 5
> search_index_latency_95 : 36450099
> search_index_latency_99 : 54188877
> search_index_latency_999 : 54188877
> search_index_latency_max : 54188877
> search_index_latency_mean : 15818891
> search_index_latency_median : 17226576
> search_index_latency_min : 1919
> search_index_throughput_count : 36125
> search_index_throughput_one : 19
> search_query_fail_count : 29
> search_query_fail_one : 0
> search_query_latency_95 : 0
> search_query_latency_99 : 0
> search_query_latency_999 : 0
> search_query_latency_max : 0
> search_query_latency_mean : 0
> search_query_latency_median : 0
> search_query_latency_min : 0
> search_query_throughput_count : 3455
>
> Also related to port time waiting as below
> netstat -anp | grep :8093 | grep EST | wc -l
> 20
> netstat -anp | grep :8093 | grep TIME_WAIT | wc -l
> 21
>
> Please help us find out issue and what will be possible solution
>
> Thanks,
> Amol
>
>
>
>
> --
> View this message in context: http://riak-users.197444.n3.nabble.com/Riak-getting-very-slow-tp4035209.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Riak getting very slow

Fred Dushin-2
In reply to this post by amol.zambare@bookmypacket.com
Hi Amol,

When you say "All not have solr on", do you mean not all nodes have search enabled?   If you are measuring Solr index latencies, then you definitely have Solr on at least one node.  Or is this just a typo?

Going on the assumption you have search enabled on all nodes (you should, if you are using search at all), you are seeing mean latencies on puts on the order of almost 16 seconds, and 99th percentile latencies reaching almost a minute. Yes, that is slow!

Do you have any other metrics on what is going on with the Solr process?  It is a separate VM, and in general you can probe it using JMX, or even by scraping the proc file system.  I don't have anything handy out of the box, but I have written some collected python modules you should feel free to use and pilfer, if that helps:

https://github.com/fadushin/riak_puppet_stuff/tree/master/modules/riak_node/files/collectd

I am not enough of a Solr expert to say that having 100+ dynamic fields is the root cause of your issues with write latency.  It could be, so you should try to see if Solr is impacted if you write to a different Riak search index (i.e., Solr core).  That would of course require a new bucket, but those are cheap for the purposes of experimentation.  You may need to re-architect your application to use more Riak indices and bucket types, and to use statically defined Solr fields, in order to get over this hump.

Another thing you should consider doing is upgrading to Riak 2.0.9.  This includes very significant improvement to the write/index path into Solr, with support for batching and asynchronous delivery into Solr.  This won't necessarily fix your problem -- you should get to the bottom of why you are getting 16 seconds average write latencies into Solr for a single Solr document first, but it may give you some headroom in the future.

One other thing we have found leading up to the 2.0.8 release, and which was fixed in 2.0.8 and later, is that Solr does slow to a creep if you have a high number of siblings, almost linearly in the number of siblings.  This happens because Riak used to use the deleteByQuery Solr operation when indexing a document, which would cause Solr memory consumption to go through the roof, as well as CPU utilization.  We fixed this in 2.0.8 and later to delete previously existing documents by id, which is far less resource consumptive on the Solr side.  Do you have a handle on how many siblings you have in your Riak objects?

And BTW, if you upgrade to Riak 2.2.0, then you will also get an upgrade to Solr 4.10.

-Fred

On Jun 16, 2017, at 4:49 PM, [hidden email] wrote:

Hi All,

We are running riak kv 2.0.1 on 5 node, all are high end conf i.e it does
not have any load. All not have solr on.

Still, We getting very high latency

After Some investigation, i have found what will be a possible issue,
We have one bucket with solr index, solr index's each document has about
100+ dynamic fields in the Solr schema 

I have read two issue related to the same problem as below
https://github.com/basho/yokozuna/issues/719
https://github.com/basho/yokozuna/issues/330

This specify that you should not have more than 60 dynamic fields else riak
will get slow because of solr index creation will be very slow

Below is riak-admin status related to solr
rings_reconciled_total : 80
search_index_fail_count : 1011
search_index_fail_one : 5
search_index_latency_95 : 36450099
search_index_latency_99 : 54188877
search_index_latency_999 : 54188877
search_index_latency_max : 54188877
search_index_latency_mean : 15818891
search_index_latency_median : 17226576
search_index_latency_min : 1919
search_index_throughput_count : 36125
search_index_throughput_one : 19
search_query_fail_count : 29
search_query_fail_one : 0
search_query_latency_95 : 0
search_query_latency_99 : 0
search_query_latency_999 : 0
search_query_latency_max : 0
search_query_latency_mean : 0
search_query_latency_median : 0
search_query_latency_min : 0
search_query_throughput_count : 3455

Also related to port time waiting as below
netstat -anp | grep :8093 | grep EST | wc -l
20
netstat -anp | grep :8093 | grep TIME_WAIT | wc -l
21

Please help us find out issue and what will be possible solution

Thanks,
Amol




--
View this message in context: http://riak-users.197444.n3.nabble.com/Riak-getting-very-slow-tp4035209.html
Sent from the Riak Users mailing list archive at Nabble.com.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
On Jun 16, 2017, at 4:49 PM, [hidden email] wrote:

Hi All,

We are running riak kv 2.0.1 on 5 node, all are high end conf i.e it does
not have any load. All not have solr on.

Still, We getting very high latency

After Some investigation, i have found what will be a possible issue,
We have one bucket with solr index, solr index's each document has about
100+ dynamic fields in the Solr schema

I have read two issue related to the same problem as below
https://github.com/basho/yokozuna/issues/719
https://github.com/basho/yokozuna/issues/330

This specify that you should not have more than 60 dynamic fields else riak
will get slow because of solr index creation will be very slow

Below is riak-admin status related to solr
rings_reconciled_total : 80
search_index_fail_count : 1011
search_index_fail_one : 5
search_index_latency_95 : 36450099
search_index_latency_99 : 54188877
search_index_latency_999 : 54188877
search_index_latency_max : 54188877
search_index_latency_mean : 15818891
search_index_latency_median : 17226576
search_index_latency_min : 1919
search_index_throughput_count : 36125
search_index_throughput_one : 19
search_query_fail_count : 29
search_query_fail_one : 0
search_query_latency_95 : 0
search_query_latency_99 : 0
search_query_latency_999 : 0
search_query_latency_max : 0
search_query_latency_mean : 0
search_query_latency_median : 0
search_query_latency_min : 0
search_query_throughput_count : 3455

Also related to port time waiting as below
netstat -anp | grep :8093 | grep EST | wc -l
20
netstat -anp | grep :8093 | grep TIME_WAIT | wc -l
21

Please help us find out issue and what will be possible solution

Thanks,
Amol




--
View this message in context: http://riak-users.197444.n3.nabble.com/Riak-getting-very-slow-tp4035209.html
Sent from the Riak Users mailing list archive at Nabble.com.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...