A zillion tiny newbie quesstions

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

A zillion tiny newbie quesstions

Will Schenk
Good morning all!  I've a ton of overview-type questions, because I'm
slowly struggling along.

1) Documentation
- Are any general usage tutorials/documented patterns of riak?
  - for example, is there a sample application that uses riak in
rails/php/whatever?  If not, how about erlang?
  - I found js-mapreduce.org in the docs directly, which took me a
while because I expected to find stuff on http://riak.basho.com/  And
it's not.  One presumes that it's an oversite and should be linked in
the erlang-api overview file.
- Are there any tutorials?

1a) In his talk at NoSQL East, Justin Sheehy talks about a streaming
or event interface, where you can listen for things as they come into
the system.  I'm sure that I'm getting that wrong, but where could I
read about such a thing?

2) Speaking of, how do you debug javascript map reduce stuff?
- Personally I think that printf (or io:format) is the best "how can I
reason about what is going on" tool ever invented.  I'm trying to
reason about what's happening when I submit these queries to figure
out what's going on.

2a) My preferred way of accessing riak is through ruby and I'm up and
running with ripple.  But, just to make it exciting, an email from
Kevin Smith from Mar 2 talks about a javascript map, javascript
reduce, and an erland reduce function (for storing the results in
riak).  How would you string together such a thing in real life?  Not
using the raw interface surely.  (I've written the standard chat
server in erlang and that's more or less my knowledge.)

3) Modeling/usage question.  Let's say that I'm building a web site
and I want to store my data in riak.  I'm trying to store users, and
I'm want to be able to query them with the following attributes:
- login
- email
- fb_userid
- auth_token

login & email would be used roughly as frequently.

- Would I pick one as the "key" and use it, or should I just let riak
pick the key for me?

- And if I wanted to load the user based upon fb_userid, could I
expect to use a m/r function like that in real-time?

- And, finally, I'd just want to pass the bucket name to do this,
because obviously I don't know the keys before hand.  It seems like
when this happens the first thing it does is to get the key list and
then resubmit the job, and there are ominious sounds in the
documentation that getting the key list is a potentially expensive
operation.  So is the answer then to create, say, 4 buckets,
user_login, user_email, user_fb_userid, user_auth_token which just are
the key and a link back to the user bucket/key ?  And if so, so I need
to maintain this index on the application side?


thanks!

--
Will Schenk
http://www.sublimeguile.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: A zillion tiny newbie quesstions

Sean Cribbs-2
On Mar 9, 2010, at 6:15 AM, Will Schenk wrote:

> Good morning all!  I've a ton of overview-type questions, because I'm
> slowly struggling along.
>
> 1) Documentation
> - Are any general usage tutorials/documented patterns of riak?
>  - for example, is there a sample application that uses riak in
> rails/php/whatever?  If not, how about erlang?

The best example right now is Rusty Klophaus' SlideBlast, which is built in Erlang with Nitrogen.  http://github.com/rklophaus/SlideBlast  We're also working on sample apps in other languages, as well as some internal projects that use Riak.

>  - I found js-mapreduce.org in the docs directly, which took me a
> while because I expected to find stuff on http://riak.basho.com/  And
> it's not.  One presumes that it's an oversite and should be linked in
> the erlang-api overview file.

There is a disconnect there, yes.  We'll be unveiling some new documentation resources later this week that should help in this regard.

> - Are there any tutorials?
>

For the Javascript MapReduce, try Kevin Smith's screencast on the blog (http://blog.basho.com/2010/02/03/the-release-riak-0.8-and-javascript-map/reduce/).  For general Riak usage (although things have changed since then), see the code-cast with Bryan Fink - http://videocodechat.com/post/219711761/intro-to-riak-with-bryan-fink

> 1a) In his talk at NoSQL East, Justin Sheehy talks about a streaming
> or event interface, where you can listen for things as they come into
> the system.  I'm sure that I'm getting that wrong, but where could I
> read about such a thing?
>

The event interface was recently removed because it was too resource-intensive (and an SPOF).  The status interface ( {riak_stat, true} in your app.config), plus SNMP monitoring in the Enterprise product, replaces a lot of what that event interface provided.  The dev team also has a plan for adding pre-/post- "triggers" as well.

> 2) Speaking of, how do you debug javascript map reduce stuff?
> - Personally I think that printf (or io:format) is the best "how can I
> reason about what is going on" tool ever invented.  I'm trying to
> reason about what's happening when I submit these queries to figure
> out what's going on.
>

The best advice I have is to follow the guidelines in the js-mapreduce.org file, keep your functions simple, and try to test them in isolation if you can.  Also, since they're just Javascript functions, you could test them with something like QUnit or another testing framework, outside of Riak.

> 2a) My preferred way of accessing riak is through ruby and I'm up and
> running with ripple.  But, just to make it exciting, an email from
> Kevin Smith from Mar 2 talks about a javascript map, javascript
> reduce, and an erland reduce function (for storing the results in
> riak).  How would you string together such a thing in real life?  Not
> using the raw interface surely.  (I've written the standard chat
> server in erlang and that's more or less my knowledge.)
>

Map-Reduce jobs have their own interface.  In Ripple, you can use Riak::MapReduce to create a job that has an Erlang reduce phase by specifying the module and function name:

Riak::MapReduce.new(client).reduce("my_module", "my_reduce_function")

See http://seancribbs.github.com/ripple/Riak/MapReduce.html#reduce-instance_method and http://seancribbs.github.com/ripple/Riak/MapReduce/Phase.html#initialize-instance_method .

> 3) Modeling/usage question.  Let's say that I'm building a web site
> and I want to store my data in riak.  I'm trying to store users, and
> I'm want to be able to query them with the following attributes:
> - login
> - email
> - fb_userid
> - auth_token
>
> login & email would be used roughly as frequently.
>
> - Would I pick one as the "key" and use it, or should I just let riak
> pick the key for me?
>

Either strategy is valid.  You should probably do some analysis to determine which field is the most common query criterion and then let that be the key.  My guess would be email or login.

> - And if I wanted to load the user based upon fb_userid, could I
> expect to use a m/r function like that in real-time?
>

It depends largely on the size of your dataset and the frequency with which your application queries on that field.  Full bucket-scans would be pretty inefficient for this, however.

> - And, finally, I'd just want to pass the bucket name to do this,
> because obviously I don't know the keys before hand.  It seems like
> when this happens the first thing it does is to get the key list and
> then resubmit the job, and there are ominious sounds in the
> documentation that getting the key list is a potentially expensive
> operation.  So is the answer then to create, say, 4 buckets,
> user_login, user_email, user_fb_userid, user_auth_token which just are
> the key and a link back to the user bucket/key ?  And if so, so I need
> to maintain this index on the application side?
>

If querying on those fields is frequent enough, then yes, I would suggest you build out other buckets with objects that link back to the users bucket -- this is analogous to adding indices to tables in your relational database.  In Ripple, this can be done with a lifecycle callback, like after_save. (See also http://github.com/seancribbs/ripple/issues#issue/6)  The primary question is whether you want to take the hit when writing or when reading.  For most applications, the former is preferable because reads far outnumber the writes.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/
_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: A zillion tiny newbie quesstions

Will Schenk
Thanks for the quick response!

> - Are there any tutorials?

The screen casts were helpful, even if they are a bit out of date.
(And the audio is a bit rough on videocodechat...)  But I was thinking
something more like
http://code.google.com/p/redis/wiki/TwitterAlikeExample where it goes
more into usage scenarios rather than just straightforward gets and
sets like the SlideBlast code.  Like "if you want to query on
different fields you can user an after_save to create your own index"
etc...

(Nitrogen looks pretty nifty too!)

>> 2) Speaking of, how do you debug javascript map reduce stuff?

> The best advice I have is to follow the guidelines in the js-mapreduce.org file, keep your functions simple, and try to test them in isolation if you can.  Also, since they're just Javascript functions, you could test them with something like QUnit or another testing framework, outside of Riak.

Well, ok, but is there any way to see what's going on?  This produces
a response that I want:

mr = Riak::MapReduce.new( client )
mr.add( "users" )
mr.map( "function(v) {   var user = JSON.parse(v.values[0].data);
if( user.email == '[hidden email]' )     return [v.key];   else
return []; } ", :keep => true)
mr.run

...but I have no idea what the :keep => true thing means ("whether to
return the results of this phase": In what situation would I ever
query a database and not want the results?)

[...]

> If querying on those fields is frequent enough, then yes, I would suggest you build out other buckets with objects that link back to the users bucket -- this is analogous to adding indices to tables in your relational database.  In Ripple, this can be done with a lifecycle callback, like after_save. (See also http://github.com/seancribbs/ripple/issues#issue/6)  The primary question is whether you want to take the hit when writing or when reading.  For most applications, the former is preferable because reads far outnumber the writes.

Issue 6 seems much more like a feature request than an issue... :)  I
don't see after_save documented anywhere in ripple -- or a list of any
of the callback methods for that matter.  But ripple is ActiveModel
right so I should just assume that its all there...?  If so
- Would I have access to the old object to remove the previous link?
(i.e. if they changed their login id, I'd need to remove the old and
then insert the new)
- how do I get a reference to the Riak::Client ?

Thanks again.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: A zillion tiny newbie quesstions

Sean Cribbs-2

Well, ok, but is there any way to see what's going on?  This produces
a response that I want:

mr = Riak::MapReduce.new( client )
mr.add( "users" )
mr.map( "function(v) {   var user = JSON.parse(v.values[0].data);
if( user.email == [hidden email]' )     return [v.key];   else
return []; } ", :keep => true)
mr.run

...but I have no idea what the :keep => true thing means ("whether to
return the results of this phase": In what situation would I ever
query a database and not want the results?)


Of course you'll want to return results from your final phase.  However, :keep => true lets you return results from intermediate phases as well.  So in a sense, :keep => true will let you see the output of each phase, if you like.



Issue 6 seems much more like a feature request than an issue... :)  I
don't see after_save documented anywhere in ripple -- or a list of any
of the callback methods for that matter.  But ripple is ActiveModel
right so I should just assume that its all there...?  If so

Yes, there are feature requests in there as well as "issues".  It helps me keep track of what needs doing.  Ripple has ActiveModel callbacks, so you can add those at the class level of your Document.

- Would I have access to the old object to remove the previous link?
(i.e. if they changed their login id, I'd need to remove the old and
then insert the new)

Ripple::Document has "dirty" attributes, so you can see what changed before you save.  Perhaps a before_save to capture the needed changes, plus an after_save to update/delete the objects?

- how do I get a reference to the Riak::Client ?


You can use Ripple.client, which is a thread-local instance of Riak::Client.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: A zillion tiny newbie quesstions

Will Schenk
> - Would I have access to the old object to remove the previous link?

> (i.e. if they changed their login id, I'd need to remove the old and
> then insert the new)
>
> Ripple::Document has "dirty" attributes, so you can see what changed before
> you save.  Perhaps a before_save to capture the needed changes, plus an
> after_save to update/delete the objects?
>
> - how do I get a reference to the Riak::Client ?
>
>
> You can use Ripple.client, which is a thread-local instance of Riak::Client.
Can you take a look at the attached file to see if I'm doing this
"manual indexing" correctly?  It seems like I should be able to do the
query directly instead of getting the "user_email" object and then
calling "walk" on that.  Also, it returns a nested array.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

riak_test.rb (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A zillion tiny newbie quesstions

Sean Cribbs-2

Can you take a look at the attached file to see if I'm doing this
"manual indexing" correctly?  It seems like I should be able to do the
query directly instead of getting the "user_email" object and then
calling "walk" on that.  Also, it returns a nested array.


Your script seems fine, except that you don't have to explicitly load the object that links back in order to link-walk, you just have to have the key. For example,

user_list = users_email.new(email_address).walk(:bucket => 'users', :keep => true)

Link walking operations always return nested arrays, where each inner array is one phase of link following.  If you don't 'keep' the phase, you'll get an empty array.

Sean

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com