Distributed reduce phases via pre-reduce [Was "Secondary Indexes - Feedback?"]

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Distributed reduce phases via pre-reduce [Was "Secondary Indexes - Feedback?"]

bryan-basho
Administrator
On Thu, Nov 17, 2011 at 9:45 AM, Gordon Tillman <[hidden email]> wrote:
> I'm really interested in being able to implement distributed
> reduce phases (specifically to do a partial sort)  and then have that output
> handle by a final reduce phase that could perform an efficient merge sort
>  and stream results back to the client.  That would be really cool!

Hi, Gordon.  I just caught up on the 2i thread, and noticed your
comment at the end.  As of 1.0 and RiakPipe-based MapReduce, you are
able to do something like this with the "pre-reduce" tuning
functionality:

http://wiki.basho.com/MapReduce.html#Pre-Reduce

With pre-reduce enabled, your reduce function is run in two stages.
The first stage processes the outputs of the previous map stage in
parallel, on the vnode where each output was produced.  The second
stage is reduce as you know it, processing all of the results of that
pre-reduce stage in one place.

For example, if you had these vnodes producing these outputs:

   A: 1,2,3
   B: 4,5,6
   C: 7,8,9

enabling pre-reduce would cause three parallel reduces:

   x = reduce(1,2,3)
   y = reduce(4,5,6)
   z = reduce(7,8,9)

followed by a final all-together reduce of those results:

   reduce(x,y,z)

So, it's the same function evaluated in both stages, but if you've
written it to play well with re-reduce, it should "just work".

Hope that helps,
Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Distributed reduce phases via pre-reduce [Was "Secondary Indexes - Feedback?"]

Gordon Tillman
Bryan thanks very much for the info.  I am going to investigate this over the weekend.  What I am trying to achieve is the ability to sort a potentially very large dataset in the most efficient way possible.

Regards,

--g



On Nov 18, 2011, at 10:12 , Bryan Fink wrote:

> On Thu, Nov 17, 2011 at 9:45 AM, Gordon Tillman <[hidden email]> wrote:
>> I'm really interested in being able to implement distributed
>> reduce phases (specifically to do a partial sort)  and then have that output
>> handle by a final reduce phase that could perform an efficient merge sort
>>  and stream results back to the client.  That would be really cool!
>
> Hi, Gordon.  I just caught up on the 2i thread, and noticed your
> comment at the end.  As of 1.0 and RiakPipe-based MapReduce, you are
> able to do something like this with the "pre-reduce" tuning
> functionality:
>
> http://wiki.basho.com/MapReduce.html#Pre-Reduce
>
> With pre-reduce enabled, your reduce function is run in two stages.
> The first stage processes the outputs of the previous map stage in
> parallel, on the vnode where each output was produced.  The second
> stage is reduce as you know it, processing all of the results of that
> pre-reduce stage in one place.
>
> For example, if you had these vnodes producing these outputs:
>
>   A: 1,2,3
>   B: 4,5,6
>   C: 7,8,9
>
> enabling pre-reduce would cause three parallel reduces:
>
>   x = reduce(1,2,3)
>   y = reduce(4,5,6)
>   z = reduce(7,8,9)
>
> followed by a final all-together reduce of those results:
>
>   reduce(x,y,z)
>
> So, it's the same function evaluated in both stages, but if you've
> written it to play well with re-reduce, it should "just work".
>
> Hope that helps,
> Bryan


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com