problem with baseball MR example?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with baseball MR example?

Nate Lawson
I was looking at the baseball MR example on the blog.

http://basho.com/blog/technical/2011/01/20/Baseball-Batting-Averages-Riak-Map-Reduce/

One thing I was wondering was how the file split mechanism is aware of record lengths. It doesn't look like the author is using any particular split function to identify record boundaries and make a clean cut. So you likely have records that span the 1 MB boundary and are corrupted for the map job.

Perhaps this is the flaw the author hints at? If so, what's the proper way in Riak MR to specify a split function to be sure proper boundaries are applied to Luwak files?

-Nate


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: problem with baseball MR example?

bryan-basho
Administrator
On Fri, Oct 7, 2011 at 8:56 PM, Nate Lawson <[hidden email]> wrote:
> Perhaps this is the flaw the author hints at? If so, what's the proper way in Riak MR to specify a split function to be sure proper boundaries are applied to Luwak files?

Hi, Nate.  Indeed, you found the flaw.  If you haven't stumbled on it
already, I'd encourage you to read the followup, which tackles just
that issue:

http://basho.com/blog/technical/2011/01/26/Fixing-the-Count/

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com