Secondary Index Map and reduce order and performance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Secondary Index Map and reduce order and performance

Sajithkumar Kizhakkiniyil

Hello

Probably my understanding of M/R might be wrong. But I am getting drastic performance difference when running secondary index query on PB with map and reduce function in different order.

If my understanding is correct a reduce phase with riak_kv_mapreduce.reduce_identity is needed for secondary index query. I added one map phase to get the value instead of the key

 

But if I send the reduce before the map as you see in the map reduce payload JSON the values are return much faster than the other way. In my test it 251 ms vs 700ms. Anyone can explain this behavior.

 

Reduce before map (Faster)

-------

{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}},{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}}]}

 

Map before reduce (Slower)

--------------

{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}},{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}}]}

 



This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Secondary Index Map and reduce order and performance

Alexander Sicular
<base href="x-msg://3156/">Do you get the results in both cases?

-Alexander Sicular

@siculars

On Nov 30, 2011, at 4:28 PM, Sajithkumar Kizhakkiniyil wrote:

Hello
Probably my understanding of M/R might be wrong. But I am getting drastic performance difference when running secondary index query on PB with map and reduce function in different order.
If my understanding is correct a reduce phase with riak_kv_mapreduce.reduce_identity is needed for secondary index query. I added one map phase to get the value instead of the key
 
But if I send the reduce before the map as you see in the map reduce payload JSON the values are return much faster than the other way. In my test it 251 ms vs 700ms. Anyone can explain this behavior.
 
Reduce before map (Faster)
-------
{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}},{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}}]}
 
Map before reduce (Slower)
--------------
{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}},{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}}]}
 


This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Secondary Index Map and reduce order and performance

Alexander Sicular
In reply to this post by Sajithkumar Kizhakkiniyil
<base href="x-msg://3159/">Do you get the *same* results in both cases?

-Alexander Sicular

@siculars

On Nov 30, 2011, at 4:28 PM, Sajithkumar Kizhakkiniyil wrote:

Hello
Probably my understanding of M/R might be wrong. But I am getting drastic performance difference when running secondary index query on PB with map and reduce function in different order.
If my understanding is correct a reduce phase with riak_kv_mapreduce.reduce_identity is needed for secondary index query. I added one map phase to get the value instead of the key
 
But if I send the reduce before the map as you see in the map reduce payload JSON the values are return much faster than the other way. In my test it 251 ms vs 700ms. Anyone can explain this behavior.
 
Reduce before map (Faster)
-------
{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}},{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}}]}
 
Map before reduce (Slower)
--------------
{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}},{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}}]}
 


This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

RE: Secondary Index Map and reduce order and performance

Sajithkumar Kizhakkiniyil
<base href="x-msg://3159/">

In my testing I did get the same result. My scenario is simple I created 200 keys with the same 2i key/value and retrieved it.

 

Regards

Sajith

 

 

From: Alexander Sicular [mailto:[hidden email]]
Sent: Wednesday, November 30, 2011 1:32 PM
To: Sajithkumar Kizhakkiniyil
Cc: [hidden email]
Subject: Re: Secondary Index Map and reduce order and performance

 

Do you get the *same* results in both cases?


-Alexander Sicular

 

@siculars

 

On Nov 30, 2011, at 4:28 PM, Sajithkumar Kizhakkiniyil wrote:



Hello

Probably my understanding of M/R might be wrong. But I am getting drastic performance difference when running secondary index query on PB with map and reduce function in different order.

If my understanding is correct a reduce phase with riak_kv_mapreduce.reduce_identity is needed for secondary index query. I added one map phase to get the value instead of the key

 

But if I send the reduce before the map as you see in the map reduce payload JSON the values are return much faster than the other way. In my test it 251 ms vs 700ms. Anyone can explain this behavior.

 

Reduce before map (Faster)

-------

{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}},{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}}]}

 

Map before reduce (Slower)

--------------

{"inputs":{"index":"PERFTEST_INDEX_NAME_bin","bucket":"_ITEST_SI_BUCKET","key":"PERFTEST_INDEX_VALUE"},"query":[{"map":{"source":"function(value,keyData,arg){ return [value.values[0].data]; }","language":"javascript","keep":true}},{"reduce":{"arg":"{reduce_phase_only_1, true}","module":"riak_kv_mapreduce","language":"erlang","keep":false,"function":"reduce_identity"}}]}

 

 


This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

 



This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Secondary Index Map and reduce order and performance

bryan-basho
Administrator
In reply to this post by Sajithkumar Kizhakkiniyil
On Wed, Nov 30, 2011 at 4:28 PM, Sajithkumar Kizhakkiniyil
<[hidden email]> wrote:
> If my understanding is correct a reduce phase with
> riak_kv_mapreduce.reduce_identity is needed for secondary index query.

Hi, Sajithkumar.  The reduce_identity function is only needed if the
result you want is the list of bucket/key pairs.  If you are instead
extracting the value of the object in a map phase, you can leave the
reduce_identity call out.

In fact, reduce_identity's proclivity for bucket/key pairs is the
reason your MapReduce query is so slow when reduce_identity is tacked
on the end.  That function throws an exception for any input it gets
that is *not* a bucket/key pair.  If you look in your Riak log, you'll
likely see hundreds of messages along the lines of
"throw:{unhandled_entry,…} reducing: …".  The reason you get your
expected result back instead of an error is a fluke caused by this
known issue: https://issues.basho.com/show_bug.cgi?id=1185

If you're mapping over the objects generated by a 2i query, leave
reduce_identity out, and your query will be much happier.

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com