map function for link-walking

classic Classic list List threaded Threaded
5 messages Options
nfo
Reply | Threaded
Open this post in threaded view
|

map function for link-walking

nfo
In the "one-to-very-many link associations" thread , Sean Cribbs talks
about a map function which does link-walking from links stored in
object contents. http://bit.ly/cKguqQ

"Another way to cope with large numbers of links is to
encapsulate them in the object itself, rather than in the headers.  This removes
the header-length/count limitation, but would require you to have a map function
that understands the internals of the object.  Also, you would need to deal with
the larger size of the object, which could potentially slow down your request."

Is there any chance someone shares the code of a map function doing
this (custom-)link-walking ?

The only example I found is in the "Practical Map-Reduce: Forwarding
and Collecting" blog article
http://blog.basho.com/2010/04/14/practical-map-reduce:-forwarding-and-collecting/
It gatheres links from objects and call the "map" function on them.
So, as Sean says, a Link object has to be build to be able to call the
"map" function on it.

By the way, the result of the "map" function is cached like any
standard Map phase ?

It's the only step before we go with Riak in our project :)

Thanks!

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: map function for link-walking

bryan-basho
Administrator
On Sat, Jul 10, 2010 at 4:45 PM, Nicolas Fouché <[hidden email]> wrote:

> In the "one-to-very-many link associations" thread , Sean Cribbs talks
> about a map function which does link-walking from links stored in
> object contents. http://bit.ly/cKguqQ
>
> "Another way to cope with large numbers of links is to
> encapsulate them in the object itself, rather than in the headers.  This removes
> the header-length/count limitation, but would require you to have a map function
> that understands the internals of the object.  Also, you would need to deal with
> the larger size of the object, which could potentially slow down your request."
>
> Is there any chance someone shares the code of a map function doing
> this (custom-)link-walking ?

Hi, Nicolas.  Any function you have that returns a list of bucket-key
pairs, in the same format as the "inputs" list for the map/reduce
query, will work.  For example, if you stored your object's links in a
"mylinks" field in it's value, like so:

$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/foo --data @-
{"mylinks":[["example","bar"],["example","baz"]],"myval":1}
^D
$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/bar --data @-
{"mylinks":[["example","baz"]],"myval":2}
^D
$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/baz --data @-
{"mylinks":[["example","foo"]],"myval":3}
^D

Then you could use a very simple map function like:
   function(v) {
      return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
   }

And then the link-walking is simple:

carboy:riak bryan$ curl -X POST -H "content-type:application/json"
http://localhost:8098/mapred --data @-
{"inputs":[["example","foo"]],"query":[{"map":{"language":"javascript","source":"function(v)
{ return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
}"}},{"map":{"language":"javascript","source":"function(v) { return
[JSON.parse(v.values[0].data).myval]; }"}}]}
^D
[2,3]

That query uses two map phases to start at the example/foo object I
created above, and then follow the links it has to the example/bar and
example/baz, and extracting the "myval" field from the values of those
objects.

I'd recommend adding a little defensive programming in to make sure
that "mylinks" is defined, and that it's a list of the proper shape.
It would also be a good idea to define these function in a file that
Riak would preload, instead of specifying them dynamically in the
query (for performance).  But, you could also take it in another
direction: if you knew that all of your links were going to point to
objects in a certain bucket, you could store just the keys in the
object, and produce bucket-key pairs with a quick map function  (e.g.
mykeys.map(function(k) { return ["otherbucket", k]; })

Hope that helps.

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
nfo
Reply | Threaded
Open this post in threaded view
|

Re: map function for link-walking

nfo
I was expecting some really tricky code, but it's just simple and
clean. Thanks a lot.
My links point to objects in the same bucket, so I'll store an array
of object keys and I'll add the bucket name directly in the map
function.

I did not find any doc about preloading javascript functions. Is it
the same as storing JS files in a bucket and load them thanks to the
"bucket" and "key" fields, as described in the "Map" paragraph of the
Fast Track ? https://wiki.basho.com/display/RIAK/Loading+Data+and+Running+MapReduce+Queries

-Nicolas

On Sun, Jul 11, 2010 at 12:20 AM, Bryan Fink <[hidden email]> wrote:

> On Sat, Jul 10, 2010 at 4:45 PM, Nicolas Fouché <[hidden email]> wrote:
>> In the "one-to-very-many link associations" thread , Sean Cribbs talks
>> about a map function which does link-walking from links stored in
>> object contents. http://bit.ly/cKguqQ
>>
>> "Another way to cope with large numbers of links is to
>> encapsulate them in the object itself, rather than in the headers.  This removes
>> the header-length/count limitation, but would require you to have a map function
>> that understands the internals of the object.  Also, you would need to deal with
>> the larger size of the object, which could potentially slow down your request."
>>
>> Is there any chance someone shares the code of a map function doing
>> this (custom-)link-walking ?
>
> Hi, Nicolas.  Any function you have that returns a list of bucket-key
> pairs, in the same format as the "inputs" list for the map/reduce
> query, will work.  For example, if you stored your object's links in a
> "mylinks" field in it's value, like so:
>
> $ curl -X PUT -H "content-type:application/json"
> http://localhost:8098/riak/example/foo --data @-
> {"mylinks":[["example","bar"],["example","baz"]],"myval":1}
> ^D
> $ curl -X PUT -H "content-type:application/json"
> http://localhost:8098/riak/example/bar --data @-
> {"mylinks":[["example","baz"]],"myval":2}
> ^D
> $ curl -X PUT -H "content-type:application/json"
> http://localhost:8098/riak/example/baz --data @-
> {"mylinks":[["example","foo"]],"myval":3}
> ^D
>
> Then you could use a very simple map function like:
>   function(v) {
>      return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
>   }
>
> And then the link-walking is simple:
>
> carboy:riak bryan$ curl -X POST -H "content-type:application/json"
> http://localhost:8098/mapred --data @-
> {"inputs":[["example","foo"]],"query":[{"map":{"language":"javascript","source":"function(v)
> { return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
> }"}},{"map":{"language":"javascript","source":"function(v) { return
> [JSON.parse(v.values[0].data).myval]; }"}}]}
> ^D
> [2,3]
>
> That query uses two map phases to start at the example/foo object I
> created above, and then follow the links it has to the example/bar and
> example/baz, and extracting the "myval" field from the values of those
> objects.
>
> I'd recommend adding a little defensive programming in to make sure
> that "mylinks" is defined, and that it's a list of the proper shape.
> It would also be a good idea to define these function in a file that
> Riak would preload, instead of specifying them dynamically in the
> query (for performance).  But, you could also take it in another
> direction: if you knew that all of your links were going to point to
> objects in a certain bucket, you could store just the keys in the
> object, and produce bucket-key pairs with a quick map function  (e.g.
> mykeys.map(function(k) { return ["otherbucket", k]; })
>
> Hope that helps.
>
> -Bryan
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: map function for link-walking

bryan-basho
Administrator
On Sun, Jul 11, 2010 at 9:19 AM, Nicolas Fouché <[hidden email]> wrote:
> I did not find any doc about preloading javascript functions. Is it
> the same as storing JS files in a bucket and load them thanks to the
> "bucket" and "key" fields, as described in the "Map" paragraph of the
> Fast Track ? https://wiki.basho.com/display/RIAK/Loading+Data+and+Running+MapReduce+Queries

Oops.  You're right - we haven't documented this feature well.  The
best I've found is Kevin Smith's reply to another thread on this list:

http://markmail.org/message/bc7ufl2z42yu6dmg

It involves modifying your app.config file to set the js_source_dir
variable for the riak_kv app.  For a good example of how to structure
a preloaded-JS file, check out the one that ships with Riak:

http://bitbucket.org/basho/riak/src/tip/apps/riak_kv/priv/mapred_builtins.js

All of the functions in that file are available to Javascript
map/reduce functions as Riak.<function name>.  Those functions will
still be available even after you set js_source_dir.  You might
consider wrapping your application's useful functions in a MyApp class
or some such.

It's also useful to know about "bin/riak-admin js_reload" if you're
doing this.  That command will re-read all of your preloaded
Javascript files.

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
nfo
Reply | Threaded
Open this post in threaded view
|

Re: map function for link-walking

nfo
Discussing about this with danoyoung and seancribbs on IRC, I created
two Ruby gists which implement this "custom-link"-walking solution.

http://gist.github.com/472547
http://gist.github.com/472565

-Nicolas

On Sun, Jul 11, 2010 at 5:03 PM, Bryan Fink <[hidden email]> wrote:

> On Sun, Jul 11, 2010 at 9:19 AM, Nicolas Fouché <[hidden email]> wrote:
>> I did not find any doc about preloading javascript functions. Is it
>> the same as storing JS files in a bucket and load them thanks to the
>> "bucket" and "key" fields, as described in the "Map" paragraph of the
>> Fast Track ? https://wiki.basho.com/display/RIAK/Loading+Data+and+Running+MapReduce+Queries
>
> Oops.  You're right - we haven't documented this feature well.  The
> best I've found is Kevin Smith's reply to another thread on this list:
>
> http://markmail.org/message/bc7ufl2z42yu6dmg
>
> It involves modifying your app.config file to set the js_source_dir
> variable for the riak_kv app.  For a good example of how to structure
> a preloaded-JS file, check out the one that ships with Riak:
>
> http://bitbucket.org/basho/riak/src/tip/apps/riak_kv/priv/mapred_builtins.js
>
> All of the functions in that file are available to Javascript
> map/reduce functions as Riak.<function name>.  Those functions will
> still be available even after you set js_source_dir.  You might
> consider wrapping your application's useful functions in a MyApp class
> or some such.
>
> It's also useful to know about "bin/riak-admin js_reload" if you're
> doing this.  That command will re-read all of your preloaded
> Javascript files.
>
> -Bryan
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com