MR Timeout

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

MR Timeout

Yousuf Fauzan
Hello,

My Riak setup is like following:
1. Riak 1.1.4
2. Ubuntu 12.04
3. 3 nodes (EC2 large instance)

I am using secondary index for input. The size of input is around 15k records

The query fails with the following error
<<"{\"phase\":1,\"error\":\"{{{badmatch,[]},[{riak_kv_js_manager,needs_reload,2},{riak_kv_js_manager,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]},{gen_server,call,[riak_kv_js_map,{reserve_vm,<0.9908.0>},infinity]}}\",\"input\":\"{ok,{r_object,<<\\\"username\\\">>,<<\\\"4SH57586280\\\">>,[{r_content,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\\\"Links\\\">>,{{<<\\\"contacts|20120723\\\">>,<<\\\"user5|user1|Contacted|20120724001854\\\">>},<<\\\"contacts|20120723\\\">>},{{<<\\\"contacts|20120723\\\">>,<<\\\"user8|user2|Contacted|20120724003218\\\">>},<<\\\"contacts|20120723\\\">>},{{<<\\\"contacts|20120729\\\">>,<<\\\"user10|user3|Declined|20120729093511\\\">>},<<\\\"contacts|20120729\\\">>},{{<<\\\"contact...\\\">>,...},...},...]],...}}},...}],...},...}\",\"type\":\"exit\",\"stack\":\"[{gen_server,call,3},{riak_kv_js_manager,blocking_dispatch,4},{riak_kv_mrc_map,map_js,3},{riak_kv_mrc_map,process,3},{riak_pipe_vnode_worker,process_input,3},{riak_pipe_vnode_worker,wait_for_input,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]\"}">>

However, if I re-run the query a couple of times, it eventually succeeds.

Some of the parameters of app.config that I changed:
    {map_js_vm_count, 128 },
    {reduce_js_vm_count, 32 },

    {js_max_vm_mem, 16},

Also, MR is quite slow. Is there a way to speed things up?

Please help.

--
Yousuf Fauzan



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: MR Timeout

bryan-basho
Administrator
On Fri, Aug 3, 2012 at 10:50 AM, Yousuf Fauzan <[hidden email]> wrote:
> The query fails with the following error
> <<"{\"phase\":1,\"error\":\"{{{badmatch,[]},[{riak_kv_js_manager,needs_reload,2},{riak_kv_js_manager,handle_call,3},...

Hi, Yousuf. I think there may be a race between a Javascript VM
marking itself idle and the same VM getting the message that its
manager has died. Could you please check in your Riak logs for an
error similar to:

16:32:10.693 [error] Supervisor riak_kv_sup had child riak_kv_js_map
started with riak_kv_js_manager:start_link(riak_kv_js_map, 8) at
<0.284.0> exit with reason killed in context child_terminated

The important part to look for is the first bit about "Supervisor
riak_kv_sup had child riak_kv_js_map started with
riak_kv_js_manager:start_link". If that happened, then the error your
seeing is an old VM trying to mark itself idle with a new manager that
doesn't know about it. I'll work up a patch to solve this issue.

*Why* that manager exited, if that is indeed what you find in your
logs, is another question, and may have to do with many JS VMs
crashing suddenly. Such situations are often linked to high memory
pressure, in my experience.

> Also, MR is quite slow. Is there a way to speed things up?

The biggest speedup in MR performance often comes from rewriting
Javascript phases in Erlang. It's annoying, yes, but it avoids the
costs of JSON (de-)serialization and inter-VM communication. Reduce
phases, in particular, also benefit from ensuring that your function
is actually *reducing* (not accumulating an ever-growing result on
each invocation), and from tuning reduce_phase_batch_size or even
using reduce_phase_only_1.
(http://wiki.basho.com/MapReduce-Implementation.html#Configuration-Tuning-for-Reduce-Phases)

Hope that helps,
Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com