problem counting items with MapReduce

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

problem counting items with MapReduce

Antonio Rohman Fernandez

Hello there,

I find myself having some troubles counting keys with Riak 1.2.1 ( somehow in Riak 0.14 it worked with no problems )... can somebody point me out what i am doing wrong? i used to be able to do it easily on Riak 0.14.

The example should try to get all sales made by men "sex_bin:male" that was born in 1981.

QUERY:
{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}}
  ]
}

RESPONSE: ( correct, gives back an array of keys matching the 2i + 1981 condition )
["124462","93243","130181","137541","129031","123752","132741","124463","123750","18883","132794","141952","163773","115171","142828","13967","110816","107450","106411","104520","14428","119676","104154","124227","134760","17417","18358","113849","116257","68410","107121","101391","5435","24861","22296","19816","137953","160224","492","155590","162772","4504","56985","131733","36343","127297","124284","17527","100608","17376","87713","115255","67678","82488","45013","84145","71773","18034","17850","32026","20261","38170","18882","70859","30835","67477","150685","5354","2601","56982","37874","115994","111981","106408","34411","121848","140943","138893","31025","32493","90380","39753","154727","83721","71981","75070","71597","129397","81450","78752","64879","78502","48270","36548","979","39484","29808","90266","79364","24924","71416"]

QUERY:
{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}}
  ]
}

RESPONSE: ( also correct, doing a Re-Reduce just returning 'v' gives back again the list )
["132741","131733","141952","124462","127297","123752","124284","107450","5435","93243","132794","142828","32026","13967","115994","67678","113849","150685","115255","34411","2601","137541","18034","17850","130181","129031","106408","111981","17527","124227","68410","138893","45013","124463","123750","119676","18883","137953","163773","110816","56982","106411","154727","104520","107121","121848","140943","17376","71597","20261","18882","14428","115171","38170","48270","81450","22296","78752","19816","492","4504","101391","24861","104154","31025","82488","36548","162772","979","71981","37874","39753","67477","84145","129397","160224","71773","155590","78502","18358","100608","5354","87713","70859","24924","134760","56985","116257","30835","17417","36343","29808","71416","64879","90266","90380","83721","75070","39484","32493","79364"]

QUERY:
{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return [v.length]; }"}}
  ]
}

RESPONSE: ( wrong... counting the length of the array gives [2] instead of [101]
[2]

---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

JS:
<script type="text/javascript">
  var v = ["124462","93243","130181","137541","129031","123752","132741","124463","123750","18883","132794","141952","163773","115171","142828","13967","110816","107450","106411","104520","14428","119676","104154","124227","134760","17417","18358","113849","116257","68410","107121","101391","5435","24861","22296","19816","137953","160224","492","155590","162772","4504","56985","131733","36343","127297","124284","17527","100608","17376","87713","115255","67678","82488","45013","84145","71773","18034","17850","32026","20261","38170","18882","70859","30835","67477","150685","5354","2601","56982","37874","115994","111981","106408","34411","121848","140943","138893","31025","32493","90380","39753","154727","83721","71981","75070","71597","129397","81450","78752","64879","78502","48270","36548","979","39484","29808","90266","79364","24924","71416"];
  alert(v.length);
</script>

[101] <--- at it should be

Thanks,
Rohman

line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[hidden email]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: problem counting items with MapReduce

Antonio Rohman Fernandez

If instead of returning [v.length] i return [v], i can see that the first reduce didn't group all keys on a 1-dimension array but it got it as a multi-dimension array... then returning a count of 2 was right, but is not how i wanted the data to be grouped... is it because i'm using 2i as inputs? because before ( Riak 0.14 ) the first reduce would group all the map phases...

QUERY:

{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"h"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return [v]; }"}}
  ]
}

RESPONSE:
[[[[[[["132741","131733","141952","132794","130181","93243","124462","123752","119676","137953","142828","13967","107450","115994","5435","22296","19816","2601","67678","115171"],"111981","121848","115255","137541","129031","45013","34411","18034","17850","56982","127297","101391","71981","124284","24861","82488","17376","38170","67477","20261"],"162772","18882","104154","124227","140943","129397","113849","81450","31025","124463","123750","160224","100608","154727","155590","87713","32026","39753","110816","106411"],"104520","14428","150685","134760","18358","68410","70859","116257","78752","37874","64879","492","17417","4504","17527","56985","106408","36343","107121","30835"],"90380","39484","48270","18883","163773","32493","75070","79364","138893","36548","979","71597","78502","29808","84145","90266","83721","71773","5354","24924"],"71416"]]

Merci,
Rohman

On 08.12.2012 12:11, Antonio Rohman Fernandez wrote:

Hello there,

I find myself having some troubles counting keys with Riak 1.2.1 ( somehow in Riak 0.14 it worked with no problems )... can somebody point me out what i am doing wrong? i used to be able to do it easily on Riak 0.14.

The example should try to get all sales made by men "sex_bin:male" that was born in 1981.

QUERY:
{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}}
  ]
}

RESPONSE: ( correct, gives back an array of keys matching the 2i + 1981 condition )
["124462","93243","130181","137541","129031","123752","132741","124463","123750","18883","132794","141952","163773","115171","142828","13967","110816","107450","106411","104520","14428","119676","104154","124227","134760","17417","18358","113849","116257","68410","107121","101391","5435","24861","22296","19816","137953","160224","492","155590","162772","4504","56985","131733","36343","127297","124284","17527","100608","17376","87713","115255","67678","82488","45013","84145","71773","18034","17850","32026","20261","38170","18882","70859","30835","67477","150685","5354","2601","56982","37874","115994","111981","106408","34411","121848","140943","138893","31025","32493","90380","39753","154727","83721","71981","75070","71597","129397","81450","78752","64879","78502","48270","36548","979","39484","29808","90266","79364","24924","71416"]

QUERY:
{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}}
  ]
}

RESPONSE: ( also correct, doing a Re-Reduce just returning 'v' gives back again the list )
["132741","131733","141952","124462","127297","123752","124284","107450","5435","93243","132794","142828","32026","13967","115994","67678","113849","150685","115255","34411","2601","137541","18034","17850","130181","129031","106408","111981","17527","124227","68410","138893","45013","124463","123750","119676","18883","137953","163773","110816","56982","106411","154727","104520","107121","121848","140943","17376","71597","20261","18882","14428","115171","38170","48270","81450","22296","78752","19816","492","4504","101391","24861","104154","31025","82488","36548","162772","979","71981","37874","39753","67477","84145","129397","160224","71773","155590","78502","18358","100608","5354","87713","70859","24924","134760","56985","116257","30835","17417","36343","29808","71416","64879","90266","90380","83721","75070","39484","32493","79364"]

QUERY:
{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return [v.length]; }"}}
  ]
}

RESPONSE: ( wrong... counting the length of the array gives [2] instead of [101]
[2]

---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

JS:
<script type="text/javascript">
  var v = ["124462","93243","130181","137541","129031","123752","132741","124463","123750","18883","132794","141952","163773","115171","142828","13967","110816","107450","106411","104520","14428","119676","104154","124227","134760","17417","18358","113849","116257","68410","107121","101391","5435","24861","22296","19816","137953","160224","492","155590","162772","4504","56985","131733","36343","127297","124284","17527","100608","17376","87713","115255","67678","82488","45013","84145","71773","18034","17850","32026","20261","38170","18882","70859","30835","67477","150685","5354","2601","56982","37874","115994","111981","106408","34411","121848","140943","138893","31025","32493","90380","39753","154727","83721","71981","75070","71597","129397","81450","78752","64879","78502","48270","36548","979","39484","29808","90266","79364","24924","71416"];
  alert(v.length);
</script>

[101] <--- at it should be

Thanks,
Rohman

line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[hidden email]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

 

--
line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[hidden email]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: problem counting items with MapReduce

bryan-basho
Administrator
In reply to this post by Antonio Rohman Fernandez
On Sat, Dec 8, 2012 at 6:11 AM, Antonio Rohman Fernandez <[hidden email]> wrote:

QUERY:

{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return [v.length]; }"}}
  ]
}

RESPONSE: ( wrong... counting the length of the array gives [2] instead of [101]
[2]


Hi, Rohman. This is working as expected. If it returned a different result on Riak 0.14, it was only by accident. The outputs from the first reduce phase do not arrive at the second reduce phase as one unit, but instead arrive a few at a time.

You can force that second reduce phase to wait for all inputs before processing by adding '"arg":{"reduce_phase_only_1":true}' to the phase's definition[1], like so:

    {"reduce":{"language":"javascript","source":"function(v,a) { return [v.length]; }","arg":{"reduce_phase_only_1":true}}}

Alternatively, you could replace both of those reduce phases with a single phase calling the built in riak_kv_mapreduce:reduce_count_inputs/2, like so:

    {"reduce":{"language":"erlang","module":"riak_kv_mapreduce","function":"reduce_count_inputs"}}

This builtin is able to handle reduce work as it becomes available, instead of waiting for the whole batch.

Hope that helps,
Bryan


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: problem counting items with MapReduce

Antonio Rohman Fernandez

Hi Bryan,

Thanks for your reply! quite curious that in my past tests with Riak 0.14, the 2nd reduce always got an all-in-one array... quite a coincidence...

Actually, i managed to solve that issue merging arrays on the first reduce like:

    {"reduce":{"language":"javascript","source":"function(v,a) { var nw = []; var isArray = true; while (isArray == true) { isArray = false; for (i in v) { if (v[i] instanceof Array) { var na = v[i]; isArray = true; } else { nw[nw.length] = v[i]; }} if (isArray == true) { v = na; }} return [nw]; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return [v[0].length]; }"}}
 
However, your {"reduce_phase_only_1":true} seems a much better way! will give it a try.
 
Thanks,
Rohman

On 10.12.2012 17:16, Bryan Fink wrote:

On Sat, Dec 8, 2012 at 6:11 AM, Antonio Rohman Fernandez <[hidden email]> wrote:

QUERY:

{
  "inputs":{"bucket":"sales","index":"sex_bin","key":"male"},
  "query":[
    {"map": {"language":"javascript","source":"function(v,k,a) { var m = v.values[0].data.match(\"1981\"); if (m != null) { return [v.key]; } else { return []; }}"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return v; }"}},
    {"reduce":{"language":"javascript","source":"function(v,a) { return [v.length]; }"}}
  ]
}

RESPONSE: ( wrong... counting the length of the array gives [2] instead of [101]
[2]

 
Hi, Rohman. This is working as expected. If it returned a different result on Riak 0.14, it was only by accident. The outputs from the first reduce phase do not arrive at the second reduce phase as one unit, but instead arrive a few at a time.
 
You can force that second reduce phase to wait for all inputs before processing by adding '"arg":{"reduce_phase_only_1":true}' to the phase's definition[1], like so:
 
    {"reduce":{"language":"javascript","source":"function(v,a) { return [v.length]; }","arg":{"reduce_phase_only_1":true}}}
 
Alternatively, you could replace both of those reduce phases with a single phase calling the built in riak_kv_mapreduce:reduce_count_inputs/2, like so:
 
    {"reduce":{"language":"erlang","module":"riak_kv_mapreduce","function":"reduce_count_inputs"}}
 
This builtin is able to handle reduce work as it becomes available, instead of waiting for the whole batch.
 
Hope that helps,
Bryan
 

 

--
line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[hidden email]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...