Strange MapReduce behavior

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange MapReduce behavior

Elias Levy
Maybe I misunderstand how MR works or maybe it is a problem with the Ruby client.  I am trying to run the following job that will filter the keys as the first phase. I am not using key filter, as the input will be a search query.  But whatever I dod, the filtering reduce phase does not appear to have any effect.  As a minimal example, in the reduce filter phase I simply return an empty array, which I believe should stop any further processing, but the job continues to return data.

For instance:

mr = Riak::MapReduce.new(riak)
mr.add('bucket', 'key')
mr.reduce("function(v) { return [] }", :keep => true))

This returns:
{"bucket"=>"key"}

Doing instead:

mr = Riak::MapReduce.new(riak)
mr.add('bucket', 'key')
mr.reduce("riak_kv_mapreduce:reduce_identity", :language => "erlang")
mr.reduce("function(v) { return [] }", :keep => true))

results in the same output.

If I place a map phase before the reduce phase:

mr = Riak::MapReduce.new(riak)
mr.add('bucket', 'key')
mr.map("function(v,e){ return [ v.bucket, v.key, e ] }")
mr.reduce("function(v) { return [] }", :keep => true)
mr.run

I get the expected output, an empty array: []. But this defeats the purpose, which is to filter the keys in the reduce phase before the objects are fetched from disk.

Can't a reduce phase appear before a map phase?

If it can, what am I doing wrong?  Or is this a bug?  Using Riak 1.0.0.

Elias


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Strange MapReduce behavior

bryan-basho
Administrator
On Wed, Oct 26, 2011 at 2:53 PM, Elias Levy <[hidden email]> wrote:
> For instance:
>
> mr = Riak::MapReduce.new(riak)
> mr.add('bucket', 'key')
> mr.reduce("function(v) { return [] }", :keep => true))
> This returns:
> {"bucket"=>"key"}

I believe this one is explainable by a known Riak 1.0 MapReduce bug:

https://issues.basho.com/show_bug.cgi?id=1185

If you check your Riak log, I bet you'll see an error message, and if
the final run of a reduce fails, then in Riak 1.0.[0,1] the reduce
phase sends on its [possibly unreduced] inputs, instead of failing.  I
hope to fix this soon.

> Doing instead:
> mr = Riak::MapReduce.new(riak)
> mr.add('bucket', 'key')
> mr.reduce("riak_kv_mapreduce:reduce_identity", :language => "erlang")
> mr.reduce("function(v) { return [] }", :keep => true))
> results in the same output.

This, I'm unable to explain, unless the syntax for calling
riak_kv_mapreduce:reduce_identity is incorrect in your example (I'm
unfamiliar with Ripple's API).  If I submit what I think is the
equivalent query directly over HTTP or the Erlang console, I see the
result come back as [].  If the given syntax is incorrect, then it's
likely generating another error that is causing the same behavior as
described in Bugzilla issue 1185.

> Can't a reduce phase appear before a map phase?

A reduce phase *can* appear before a map phase, but not if the reduce
phase is implemented in Javascript and the inputs are bucket-key
pairs.  Unfortunately, bucket-key pairs are represented as tuples that
are not correctly converted to JSON, so evaluation of the phase fails.
 The typical workaround is as you attempted in your second example:
using riak_kv_mapreduce:reduce_identity to alter the representation of
bucket-key pairs (or implementing the reduce phase in Erlang instead).

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Strange MapReduce behavior

Elias Levy
On Wed, Oct 26, 2011 at 1:18 PM, Bryan Fink <[hidden email]> wrote:
I believe this one is explainable by a known Riak 1.0 MapReduce bug:

https://issues.basho.com/show_bug.cgi?id=1185

If you check your Riak log, I bet you'll see an error message, and if
the final run of a reduce fails, then in Riak 1.0.[0,1] the reduce
phase sends on its [possibly unreduced] inputs, instead of failing.  I
hope to fix this soon.

Bryan, thanks for your answer.  It makes sense now. 

This, I'm unable to explain, unless the syntax for calling
riak_kv_mapreduce:reduce_identity is incorrect in your example (I'm
unfamiliar with Ripple's API).

It was.  It expect an array with module and function for Erlang phases, not a string with both.  Thanks for making me look into this.

Elias

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com