Determining when Riak KV CRDT update has been reflected in Solr

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Determining when Riak KV CRDT update has been reflected in Solr

Luca Favatella

How can a Riak KV client determine when a certain CRDT update
operation propagated to Solr?

I am aware that Solr gets updated asynchronously from Riak KV. I
scripted the test and this is noticeable - at the moment in my test I
sleep for 2 seconds after updating before searching (or the search may
not reflect the update). This is ok and expected.

I wonder if there is a way for the client to "wait until" Solr is
up-to-date to a certain CRDT context the client observed. Something
* Client updates CRDT in KV and reads CRDT context C1;
* Client polls Solr passing CRDT bucket-key and context C1.

I am also aware that Riak has an AAE mechanism checking whether Solr
is up to date to a certain snapshot of Riak KV, using hash trees, and
repairing Solr if necessary. I understand that Riak employs an
algorithm for checking discrepancies between KV and Solr that is prone
to false positives - i.e. over-detecting differences hence
over-repairing. I have not looked at the internals.

Such AAE KV-Solr comparison algorithm has to be able to compare leaves
of the tree too, that I understand each being a hash containing zero
or more Riak KV objects. Hence from the bucket-key I shall be able to
derive the hash of the Riak KV object, and use that for checking the
corresponding element in Solr. This may require conversion from CRDT
context to Riak KV object vclock but that would be ok - it would just
include potential siblings in the Riak object (possible in corner
cases even if CRDT I understand).

I am designing a kind of document store, storing documents in Riak KV
as CRDTs and using Solr for search. CRDT updates are infrequent -
either by humans of potentially in batches of CRDT update operations
accumulated while offline. The user, after performing updates, may
perform a search. I am trying to understand if I can provide the user
with the visibility of whether his/her updates are "supposed to have
been reflected" in the search results already (or perform a sort of
remedy to-be-defined).

Thanks and regards

riak-users mailing list
[hidden email]