Listing keys in a bucket

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Listing keys in a bucket

John Axel Eriksson
Listing keys in a bucket has been described as "bad" and something you use in development but
not in production. I'm just starting out on Riak so I'm a newbie...

I'm thinking of building an application using Riak as filestorage and possibly much more than that, but it would
at least store lots of files with, perhaps, metadata attached. How would I then list files for display in a webapp if
I don't use key listing?

I've been thinking of perhaps maintaining a separate index for all the files but this would get VERY large when
there are many files in riak, so updating this index (I assume) would involve getting the index inserting something
and then putting it back into riak.

For example in a Railsapp I would then get the whole index (as json) and modifying it, then put it back in riak
but what if there are tens of thousands of files? Should I use a separate database for storing the index perhaps - like
MySQL or Mongo? Anyone already doing something like this?


J




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Listing keys in a bucket

Matthew Scott

On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <[hidden email]> wrote:
Listing keys in a bucket has been described as "bad" and something you use in development but
not in production. I'm just starting out on Riak so I'm a newbie...

I'm thinking of building an application using Riak as filestorage and possibly much more than that, but it would
at least store lots of files with, perhaps, metadata attached. How would I then list files for display in a webapp if
I don't use key listing?

I'm still very much a Riak newb myself, but I'll take a shot at answering this one by suggesting the use of links.

From what I understand, links some quantity limits when you attach a large number, but adding and removing them is an inexpensive operation.  (Someone fact check me on that please :)

- Create a key called 'listing', perhaps even in its own bucket to prevent namespace collision.

- Create links from that 'listing' key to metadata keys.  Remember that you can attach a tag to each link to differentiate different types of links, such as "metadata".

- The metadata keys' values would contain file metadata in JSON form, and in turn have a link to the file contents key, tagged "contents".

- Remember to attach the proper mime type to the key containing your file contents.

- To get a file listing, do a map/reduce starting at "listing", following its "metadata" links and grabbing those values, then reduce to sort by a key.

- To get file contents, follow the "contents" link from the metadata key.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512

 

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Listing keys in a bucket

John Axel Eriksson
Yes, I've thought of this but as I understand it there is a bit of a problem attaching a large amount of links to a key which
would be necessary here am I right? If I had 10 000 files in riak that would mean 10 000 links attached to the "listing" key.

4 sep 2010 kl. 20.44 skrev Matthew Scott:


On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <[hidden email]> wrote:
Listing keys in a bucket has been described as "bad" and something you use in development but
not in production. I'm just starting out on Riak so I'm a newbie...

I'm thinking of building an application using Riak as filestorage and possibly much more than that, but it would
at least store lots of files with, perhaps, metadata attached. How would I then list files for display in a webapp if
I don't use key listing?

I'm still very much a Riak newb myself, but I'll take a shot at answering this one by suggesting the use of links.

From what I understand, links some quantity limits when you attach a large number, but adding and removing them is an inexpensive operation.  (Someone fact check me on that please :)

- Create a key called 'listing', perhaps even in its own bucket to prevent namespace collision.

- Create links from that 'listing' key to metadata keys.  Remember that you can attach a tag to each link to differentiate different types of links, such as "metadata".

- The metadata keys' values would contain file metadata in JSON form, and in turn have a link to the file contents key, tagged "contents".

- Remember to attach the proper mime type to the key containing your file contents.

- To get a file listing, do a map/reduce starting at "listing", following its "metadata" links and grabbing those values, then reduce to sort by a key.

- To get file contents, follow the "contents" link from the metadata key.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512

 


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Listing keys in a bucket

Sean Cribbs-2
You wouldn't put 10,000 files in a single directory on your computer, so there's no reason you would have to in Riak either.  What about representing your directories as objects, too? The only thing you would require would be a known key that's the root of the hierarchy. For example:

/riak/files/__root, which has links to:
-> "/riak/files/images" (directory)
-> "/riak/files/text" (directory)
-> "/riak/files/index.html" (file)

"/riak/files/images" which has links to:
-> "/riak/files/images%2Ffavicon.ico" (file)
-> "/riak/files/images%2Flogo.jpg" (file)

(note I escape the slashes in the key names)

With proper use of links and tags, this is not a difficult problem.  If you just want to mount Riak like a file system, you might look at Artur Bergman's riakfuse project.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.

On Sep 4, 2010, at 2:51 PM, John Axel Eriksson wrote:

Yes, I've thought of this but as I understand it there is a bit of a problem attaching a large amount of links to a key which
would be necessary here am I right? If I had 10 000 files in riak that would mean 10 000 links attached to the "listing" key.

4 sep 2010 kl. 20.44 skrev Matthew Scott:


On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <[hidden email]> wrote:
Listing keys in a bucket has been described as "bad" and something you use in development but
not in production. I'm just starting out on Riak so I'm a newbie...

I'm thinking of building an application using Riak as filestorage and possibly much more than that, but it would
at least store lots of files with, perhaps, metadata attached. How would I then list files for display in a webapp if
I don't use key listing?

I'm still very much a Riak newb myself, but I'll take a shot at answering this one by suggesting the use of links.

From what I understand, links some quantity limits when you attach a large number, but adding and removing them is an inexpensive operation.  (Someone fact check me on that please :)

- Create a key called 'listing', perhaps even in its own bucket to prevent namespace collision.

- Create links from that 'listing' key to metadata keys.  Remember that you can attach a tag to each link to differentiate different types of links, such as "metadata".

- The metadata keys' values would contain file metadata in JSON form, and in turn have a link to the file contents key, tagged "contents".

- Remember to attach the proper mime type to the key containing your file contents.

- To get a file listing, do a map/reduce starting at "listing", following its "metadata" links and grabbing those values, then reduce to sort by a key.

- To get file contents, follow the "contents" link from the metadata key.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512

 

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Listing keys in a bucket

Matthew Scott
In reply to this post by John Axel Eriksson
You may want to consider how often you will be accessing all 10,000 files in one query.

It is my understanding that with a key/value store such as Riak, it's a good idea to analyze what your common queries will be as long as your frequency of reading vs writing, and then to structure your data so that when you do a write, you precompute and denormalize as needed to satisfy those queries.

For less frequent queries, you use map/reduce as needed to get those results, then perhaps determine whether and for how long to cache those results.

This is of course different than the managed indexes given by other styles of data stores, but is part of the tradeoff involved.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512


On Sat, Sep 4, 2010 at 11:51, John Axel Eriksson <[hidden email]> wrote:
Yes, I've thought of this but as I understand it there is a bit of a problem attaching a large amount of links to a key which
would be necessary here am I right? If I had 10 000 files in riak that would mean 10 000 links attached to the "listing" key.

4 sep 2010 kl. 20.44 skrev Matthew Scott:


On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <[hidden email]> wrote:
Listing keys in a bucket has been described as "bad" and something you use in development but
not in production. I'm just starting out on Riak so I'm a newbie...

I'm thinking of building an application using Riak as filestorage and possibly much more than that, but it would
at least store lots of files with, perhaps, metadata attached. How would I then list files for display in a webapp if
I don't use key listing?

I'm still very much a Riak newb myself, but I'll take a shot at answering this one by suggesting the use of links.

From what I understand, links some quantity limits when you attach a large number, but adding and removing them is an inexpensive operation.  (Someone fact check me on that please :)

- Create a key called 'listing', perhaps even in its own bucket to prevent namespace collision.

- Create links from that 'listing' key to metadata keys.  Remember that you can attach a tag to each link to differentiate different types of links, such as "metadata".

- The metadata keys' values would contain file metadata in JSON form, and in turn have a link to the file contents key, tagged "contents".

- Remember to attach the proper mime type to the key containing your file contents.

- To get a file listing, do a map/reduce starting at "listing", following its "metadata" links and grabbing those values, then reduce to sort by a key.

- To get file contents, follow the "contents" link from the metadata key.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512

 



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Listing keys in a bucket

John Axel Eriksson
Well... perhaps my use case isn't well suited for riak, not sure, but my reasoning was that using riak as file storage with attached
metadata would be a good way of ensuring file availability. I've also considered using S3 for this which also seems like a pretty
good choice, but I would then need to store metadata somewhere else.
I really liked the way I could query riak through mapred etc and of course the automatic replication which seemed like a good fit
for ensuring file availability and storage. I guess what I would like to be able to do is a file listing, of course not of ALL the files at
once but I would still need to run a query against ALL the files to get a listing of ones I want and the only way this seems possible
is to create an index of the files to query - which of course leads to the trouble of maintaining that index (i.e updating an index which is very
large). Using links might be a good choice but as I understand it, it isnt very wise to attach so many links to a key.

I was thinking of doing something like this when adding a new file:

/riak/metadata/somekey LINKS to /riak/files/somefile

PUT file10001 content in /riak/files/file10001 (adding it to another 10 000 file keys)
PUT file10001 metadata in /riak/metadata/file10001 LINKING it to /riak/files/file10001 (adding it to another 10 000 metadata keys)

what I don't know how to do is:

either:
data = GET /riak/listing/index (which would contain 10 000 keys - i.e a large dataset)
data.push(file10001-link) (or something like that - updating the json document)

PUT /riak/listing/index (getting the index back into riak)

or using LINKS just
add the link to /riak/listing/index

then I should be able to run mapred against the index to get links to files out of if...

the problem is that it is not recommended to run queries against an entire bucket which I otherwise could
by querying the metadata bucket

adding 10 000 LINKS to a key isn't recommended

and updating an index as large as mine would get, would mean GET large dataset to application, update
large dataset in application, PUT large dataset back into riak - seems like an expensive and foolish way
of doing it.

If I could run mapred against an entire bucket I'd be fine I suppose, but that is discouraged since it would mean
listing all the keys which is an expensive operation.

So the only solution to this would be to store the index in some other kind of database, like MongoDB or MySQL which
I could of course - but I would much rather use riak for it if reasonable.

J


4 sep 2010 kl. 21.22 skrev Matthew Scott:

You may want to consider how often you will be accessing all 10,000 files in one query.

It is my understanding that with a key/value store such as Riak, it's a good idea to analyze what your common queries will be as long as your frequency of reading vs writing, and then to structure your data so that when you do a write, you precompute and denormalize as needed to satisfy those queries.

For less frequent queries, you use map/reduce as needed to get those results, then perhaps determine whether and for how long to cache those results.

This is of course different than the managed indexes given by other styles of data stores, but is part of the tradeoff involved.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512


On Sat, Sep 4, 2010 at 11:51, John Axel Eriksson <[hidden email]> wrote:
Yes, I've thought of this but as I understand it there is a bit of a problem attaching a large amount of links to a key which
would be necessary here am I right? If I had 10 000 files in riak that would mean 10 000 links attached to the "listing" key.

4 sep 2010 kl. 20.44 skrev Matthew Scott:


On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <[hidden email]> wrote:
Listing keys in a bucket has been described as "bad" and something you use in development but
not in production. I'm just starting out on Riak so I'm a newbie...

I'm thinking of building an application using Riak as filestorage and possibly much more than that, but it would
at least store lots of files with, perhaps, metadata attached. How would I then list files for display in a webapp if
I don't use key listing?

I'm still very much a Riak newb myself, but I'll take a shot at answering this one by suggesting the use of links.

From what I understand, links some quantity limits when you attach a large number, but adding and removing them is an inexpensive operation.  (Someone fact check me on that please :)

- Create a key called 'listing', perhaps even in its own bucket to prevent namespace collision.

- Create links from that 'listing' key to metadata keys.  Remember that you can attach a tag to each link to differentiate different types of links, such as "metadata".

- The metadata keys' values would contain file metadata in JSON form, and in turn have a link to the file contents key, tagged "contents".

- Remember to attach the proper mime type to the key containing your file contents.

- To get a file listing, do a map/reduce starting at "listing", following its "metadata" links and grabbing those values, then reduce to sort by a key.

- To get file contents, follow the "contents" link from the metadata key.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512

 




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Listing keys in a bucket

John Axel Eriksson
In reply to this post by Sean Cribbs-2
Yeah, ok I see it now... guess I'm still too much in some sql mindset. We will store lots of files of the same type
so I guess I would need some way of separating them, perhaps by name or something into "directories". You're
right of course about not putting 10 000 files into a directory.

Actually, in some ways we do want to use riak as a distributed file system, but would also like to run queries against
the file metadata.

4 sep 2010 kl. 21.19 skrev Sean Cribbs:

You wouldn't put 10,000 files in a single directory on your computer, so there's no reason you would have to in Riak either.  What about representing your directories as objects, too? The only thing you would require would be a known key that's the root of the hierarchy. For example:

/riak/files/__root, which has links to:
-> "/riak/files/images" (directory)
-> "/riak/files/text" (directory)
-> "/riak/files/index.html" (file)

"/riak/files/images" which has links to:
-> "/riak/files/images%2Ffavicon.ico" (file)
-> "/riak/files/images%2Flogo.jpg" (file)

(note I escape the slashes in the key names)

With proper use of links and tags, this is not a difficult problem.  If you just want to mount Riak like a file system, you might look at Artur Bergman's riakfuse project.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.

On Sep 4, 2010, at 2:51 PM, John Axel Eriksson wrote:

Yes, I've thought of this but as I understand it there is a bit of a problem attaching a large amount of links to a key which
would be necessary here am I right? If I had 10 000 files in riak that would mean 10 000 links attached to the "listing" key.

4 sep 2010 kl. 20.44 skrev Matthew Scott:


On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <[hidden email]> wrote:
Listing keys in a bucket has been described as "bad" and something you use in development but
not in production. I'm just starting out on Riak so I'm a newbie...

I'm thinking of building an application using Riak as filestorage and possibly much more than that, but it would
at least store lots of files with, perhaps, metadata attached. How would I then list files for display in a webapp if
I don't use key listing?

I'm still very much a Riak newb myself, but I'll take a shot at answering this one by suggesting the use of links.

From what I understand, links some quantity limits when you attach a large number, but adding and removing them is an inexpensive operation.  (Someone fact check me on that please :)

- Create a key called 'listing', perhaps even in its own bucket to prevent namespace collision.

- Create links from that 'listing' key to metadata keys.  Remember that you can attach a tag to each link to differentiate different types of links, such as "metadata".

- The metadata keys' values would contain file metadata in JSON form, and in turn have a link to the file contents key, tagged "contents".

- Remember to attach the proper mime type to the key containing your file contents.

- To get a file listing, do a map/reduce starting at "listing", following its "metadata" links and grabbing those values, then reduce to sort by a key.

- To get file contents, follow the "contents" link from the metadata key.

--
Matthew Scott
ElevenCraft, Inc.
http://11craft.com/
+1 360 389-2512

 

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com