luwak questions

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

luwak questions

francisco treacy-2
Hi all,

We are using Luwak to store assets, ranging from 2k CSS files up to
30+ MB video files. Every key in Luwak is prefixed with an id (e.g.
e276814e96e0616eb7c07d3bb744d333-216.jpg).

In order to manage these assets, I'd like to query Luwak for keys
starting with some id. As I found the 'luwak_tld' bucket I tried with
two approaches: list all keys and filter out on the client,  or use
key filters.

> db.keys('luwak_tld')
GET /riak/luwak_tld?keys=true

Listing all keys works fine, except when there are too many. I then
tried to filter on Riak, like so:

> db.add({ bucket: 'luwak_tld', key_filters: [['starts_with', 'e276814e96e0616eb7c07d3bb744d333']]}).map(function(v) { return [1] }).run()
POST /mapred
> { message: 'HTTP error 500: {"error":"bad_json"}'
, stack: [Getter/Setter]
, statusCode: 500
, notFound: false
}

but I get "bad_json" errors... Any other bucket works just fine. Why
won't this work like a regular bucket?


Additionally I'm concerned about key explosion. Tens of thousands of
assets would end up stored there - would it make sense to organize
data differently?

I was thinking along the lines of one bucket per book (for instance a
bucket would be e276814e96e0616eb7c07d3bb744d333), with its assets
contained in that bucket. But I would lose the advantages of Luwak.
How would you go about this?

Thanks,
Francisco

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: luwak questions

bryan-basho
Administrator
On Tue, Feb 22, 2011 at 12:17 PM, francisco treacy
<[hidden email]> wrote:

>> db.add({ bucket: 'luwak_tld', key_filters: [['starts_with', 'e276814e96e0616eb7c07d3bb744d333']]}).map(function(v) { return [1] }).run()
> POST /mapred
>> { message: 'HTTP error 500: {"error":"bad_json"}'
> , stack: [Getter/Setter]
> , statusCode: 500
> , notFound: false
> }
>
> but I get "bad_json" errors... Any other bucket works just fine. Why
> won't this work like a regular bucket?

The data stored in the luwak_tld Riak objects is an Erlang term that
is not directly convertible to JSON.  Riak is trying to convert that
object to JSON in order to hand it to the Javascript map function you
specified.

I recommend using riak_kv_mapreduce:reduce_identity/2 in a reduce
phase, instead of that Javascript map phase to assemble your keylist.
That will avoid the JSON conversion in two ways: it's Erlang native
(so it doesn't need the conversion), and reduce phases don't both with
reading the object from storage anyway.  If you really do need to look
at the object before deciding whether or not its key qualifies, you'll
need to code your logic in an Erlang function instead of Javascript.

If you need something closer to the Javascript function in your
example (which could be used to produce a count of the filtered keys,
instead of the keys themselves), I have a 'reduce_count_inputs'
function waiting in a pull request:
https://github.com/basho/riak_kv/pull/32

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: luwak questions

francisco treacy-2
2011/2/23 Bryan Fink <[hidden email]>:

> On Tue, Feb 22, 2011 at 12:17 PM, francisco treacy
> <[hidden email]> wrote:
>>> db.add({ bucket: 'luwak_tld', key_filters: [['starts_with', 'e276814e96e0616eb7c07d3bb744d333']]}).map(function(v) { return [1] }).run()
>> POST /mapred
>>> { message: 'HTTP error 500: {"error":"bad_json"}'
>> , stack: [Getter/Setter]
>> , statusCode: 500
>> , notFound: false
>> }
>>
>> but I get "bad_json" errors... Any other bucket works just fine. Why
>> won't this work like a regular bucket?
>
> The data stored in the luwak_tld Riak objects is an Erlang term that
> is not directly convertible to JSON.  Riak is trying to convert that
> object to JSON in order to hand it to the Javascript map function you
> specified.

Oh, I think I see. In this case not even the metadata can be
translated to JSON.  The usual kind of errors about JSON were about
the data itself, when attempting to parse it via Riak.mapValuesJson or
similar. That's why the supplied function should be written in Erlang.


> I recommend using riak_kv_mapreduce:reduce_identity/2 in a reduce
> phase, instead of that Javascript map phase to assemble your keylist.
> That will avoid the JSON conversion in two ways: it's Erlang native
> (so it doesn't need the conversion), and reduce phases don't both with
> reading the object from storage anyway.  If you really do need to look
> at the object before deciding whether or not its key qualifies, you'll
> need to code your logic in an Erlang function instead of Javascript.

You're right, map actually fetches the object. The use-case here is to
get a bunch of keys to delete their values in a subsequent operation.
This works (in riak-js):

db.add({ bucket: 'luwak_tld', key_filters: [['starts_with',
'e276814e96e0616eb7c07d3bb744d333']]}).reduce({ language: 'erlang',
module: 'riak_kv_mapreduce', function: 'reduce_identity', keep: true
}).run(/* remove values in this callback */)

Thanks, Bryan.

Francisco

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com