I have a 4 node cluster with 40M documents (20M per bucket) in it and I'm executing a simple MR job that counts the number of documents in a given bucket.
I used cURL to execute the job, and then decided to "kill" the cURL process (a simple ^C). I expect this to also cancel the MR job, right? Well.. it seems that Riak completely ignored the fact the the client is dead. Is this related in any way to cURL or is this the normal behavior with any client I'll use?
IMO it makes more sense that Riak will cancel the job once the connection with the client is terminated. Is this a bug in Riak?
On Thu, Sep 6, 2012 at 11:22 AM, shaharke <[hidden email]> wrote:
> IMO it makes more sense that Riak will cancel the job once the connection
> with the client is terminated. Is this a bug in Riak?
Riak won't notice that an HTTP client has terminated the connection
until it attempts to send some portion of the response to it. There
are two cases where this won't happen until the query has finished
1. non-streamed responses (the default)
2. queries that end with reduce phases (also somewhat true for
queries that contain reduce phases in any position)
In both of these cases, Riak has no reason to talk to the client until
the query finishes, and so it ignores the state of the socket until
that time. This is a general issue for Webmachine (the HTTP server
that Riak uses).
If you expect to cancel large MapReduce requests often, I suggest
using the Protocol Buffers interface, which does monitor the