Performance of link walking versus map/reduce

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance of link walking versus map/reduce

Nicolas Petton
Hi!

First of all, I'm new to Riak, so maybe my question doesn't make
sense :)

Doing some tests with map/reduce, I experienced a huge performance
difference between link walking and map/reduce:

I run Riak on a single box for testing, I have 2 buckets, 'artists' have
500 objects, and albums have nearly 3000 objects. All objects are very
small, a small string as data.

curl
http://localhost:8098/buckets/artists/keys/pink_floyd/albums,author,_
was nearly immediate, it took 0.156s

while:

curl -X POST -H "content-type:application/json" \
  http://localhost:8098/mapred --data @-
{"inputs":[["artists","pink_floyd"]],"query":[{"link":{"bucket":"albums","tag":"author"}},{"map":{"language":"javascript","source":"function(v)
{ return [v]; }"}}]}

Took nearly 2 seconds.

Is there a reason for such a speed difference?
Is the map/reduce performance expected?

Thanks,
Nico


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Performance of link walking versus map/reduce

bryan-basho
Administrator
On Fri, Feb 3, 2012 at 12:31 PM, Nicolas Petton
<[hidden email]> wrote:

> curl
> http://localhost:8098/buckets/artists/keys/pink_floyd/albums,author,_
> was nearly immediate, it took 0.156s
>
> while:
>
> curl -X POST -H "content-type:application/json" \
>  http://localhost:8098/mapred --data @-
> {"inputs":[["artists","pink_floyd"]],"query":[{"link":{"bucket":"albums","tag":"author"}},{"map":{"language":"javascript","source":"function(v)
> { return [v]; }"}}]}
>
> Took nearly 2 seconds.

The biggest difference I see is that the link-walk uses an Erlang
function where your MapReduce query uses a Javascript function
(link-walking is implemented as a MapReduce query internally).
Serializing/deserializing to JSON as well as contention for Javascript
VMs likely accounts for the lost time.

Unfortunately, you can't use exactly the same Erlang function here
(riak_kv_mapreduce:map_identity), since the /mapred resource doesn't
know how to encode its output to JSON.

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com