Search index partitioning in Riak-KV

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Search index partitioning in Riak-KV

Denis Gudtsov
Hello

I'm trying to understand how Solr indexes are interact with Riak KV store.
Let me explain a bit... The Riak uses sharding per vnode. Each physical node contains several vnodes and data stored there are indexed by Solr. As far as i understood, Solr is not clustered solution, i.e. Solr instance running on #1 node doesn't know anything about index on instance #2 and so on.
Then when search request comes in, how Riak/Solr decides at which instance they need to lookup for index? In other words, is this index is document-based partition or term-based partition?
Thank you.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Search index partitioning in Riak-KV

Fred Dushin-2
There is a Solr instance per Riak node, and each Solr instance contains a Solr core for the Riak index created.  Every replica of a Riak k/v pair has a corresponding document in the Solr instance on the same node as the vnode that stores the replica (in fact for each sibling, if your objects admit siblings).  Each document in Solr contains not only the bucket and key as indexed fields, but also the Riak partition (well, the partition ordinal, actually).  All of these fields are indexed, IIRC.

When a query executed through Riak, a coverage plan is used to generate a filter query (based on the node and partition in the cover set) that is used with shard parameters on the local Solr instance in which the query was executed.  So technically every query is using (legacy) Solr distributed query; it's just that all the shard parameters and Solr deployment topology is hidden from the user and done internally by Riak.  It's really quite elegant and clever (IMO), but it can run into scaling issues, as Solr/Lucene is quite a bit more demanding than Riak, in some real-world use-cases.

I made an attempt to explain how a lot of how this works at https://www.youtube.com/watch?v=e1yVJqRuLSg

There are far better presentations on the topic by Ryan Zezeski and other pioneers who did the initial implementation.

-Fred

On May 19, 2017, at 7:10 AM, Denis <[hidden email]> wrote:

Hello

I'm trying to understand how Solr indexes are interact with Riak KV store.
Let me explain a bit... The Riak uses sharding per vnode. Each physical node contains several vnodes and data stored there are indexed by Solr. As far as i understood, Solr is not clustered solution, i.e. Solr instance running on #1 node doesn't know anything about index on instance #2 and so on.
Then when search request comes in, how Riak/Solr decides at which instance they need to lookup for index? In other words, is this index is document-based partition or term-based partition?
Thank you.
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Search index partitioning in Riak-KV

Denis Gudtsov
In reply to this post by Denis Gudtsov
Hello Fred

Thank you very much for your explanation. I'm trying to understand how this scheme can work, but need some time... Let me come back to you after some time.
Also you're referring to Ryan Zezeski slides, but i can't find it. Could you please share me link if you have it? Thank you.


2017-05-19 19:00 GMT+03:00 <[hidden email]>:
Message: 5
Date: Fri, 19 May 2017 09:35:11 -0400
From: Fred Dushin <[hidden email]>
To: riak-users <[hidden email]>
Subject: Re: Search index partitioning in Riak-KV
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="us-ascii"

There is a Solr instance per Riak node, and each Solr instance contains a Solr core for the Riak index created.  Every replica of a Riak k/v pair has a corresponding document in the Solr instance on the same node as the vnode that stores the replica (in fact for each sibling, if your objects admit siblings).  Each document in Solr contains not only the bucket and key as indexed fields, but also the Riak partition (well, the partition ordinal, actually).  All of these fields are indexed, IIRC.

When a query executed through Riak, a coverage plan is used to generate a filter query (based on the node and partition in the cover set) that is used with shard parameters on the local Solr instance in which the query was executed.  So technically every query is using (legacy) Solr distributed query; it's just that all the shard parameters and Solr deployment topology is hidden from the user and done internally by Riak.  It's really quite elegant and clever (IMO), but it can run into scaling issues, as Solr/Lucene is quite a bit more demanding than Riak, in some real-world use-cases.

I made an attempt to explain how a lot of how this works at https://www.youtube.com/watch?v=e1yVJqRuLSg <https://www.youtube.com/watch?v=e1yVJqRuLSg>

There are far better presentations on the topic by Ryan Zezeski and other pioneers who did the initial implementation.

-Fred



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com