Future roadmap for indexed queries?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Future roadmap for indexed queries?

John Lynch
I am preparing to give a talk on Riak next week to a local Ruby user group here in San Diego, and wanted to get your thoughts on the future of Riak.  While in its current form it is awesome for loads of use cases, it falls short in the whole querying department, at least as it relates to building typical web applications. I get the map/reduce features, and the link features, but are there any plans to build in indexed query capabilities, a la MongoDB?  Mongo obviously has an easier time of this since it knows exactly what format your data is in (BSON). Or, is this something you would leave to third-parties to build as part of an ORM framework like Ripple, for example, which would have a better idea of the shape of the data and could create/maintain indexes accordingly...


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Future roadmap for indexed queries?

Preston Marshall
John, Riak allows you to cache mapreduce queries if you create them as named queries.  In a way, this kind of creates an index, because I think it works a bit like CouchDB where the caches are added to when there is new data.  I think Riak also has a similar query paradigm to CouchDB, which is dynamic data, static queries.  I may be completely wrong here, so feel free to correct me.

Thanks,
Preston

On Feb 26, 2010, at 3:53 PM, John Lynch wrote:

I am preparing to give a talk on Riak next week to a local Ruby user group here in San Diego, and wanted to get your thoughts on the future of Riak.  While in its current form it is awesome for loads of use cases, it falls short in the whole querying department, at least as it relates to building typical web applications. I get the map/reduce features, and the link features, but are there any plans to build in indexed query capabilities, a la MongoDB?  Mongo obviously has an easier time of this since it knows exactly what format your data is in (BSON). Or, is this something you would leave to third-parties to build as part of an ORM framework like Ripple, for example, which would have a better idea of the shape of the data and could create/maintain indexes accordingly...


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Future roadmap for indexed queries?

Doug Tangren
How do clients cache/register customer named map/reduce functions with the riak server?
 
-Doug Tangren
http://lessis.me


On Fri, Feb 26, 2010 at 4:57 PM, Preston Marshall <[hidden email]> wrote:
John, Riak allows you to cache mapreduce queries if you create them as named queries.  In a way, this kind of creates an index, because I think it works a bit like CouchDB where the caches are added to when there is new data.  I think Riak also has a similar query paradigm to CouchDB, which is dynamic data, static queries.  I may be completely wrong here, so feel free to correct me.

Thanks,
Preston

On Feb 26, 2010, at 3:53 PM, John Lynch wrote:

I am preparing to give a talk on Riak next week to a local Ruby user group here in San Diego, and wanted to get your thoughts on the future of Riak.  While in its current form it is awesome for loads of use cases, it falls short in the whole querying department, at least as it relates to building typical web applications. I get the map/reduce features, and the link features, but are there any plans to build in indexed query capabilities, a la MongoDB?  Mongo obviously has an easier time of this since it knows exactly what format your data is in (BSON). Or, is this something you would leave to third-parties to build as part of an ORM framework like Ripple, for example, which would have a better idea of the shape of the data and could create/maintain indexes accordingly...


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Future roadmap for indexed queries?

Preston Marshall
Look in the riak config file for {js_source_dir, "/tmp/js_source"}.  Modify this to your likings, and Riak will parse each function in the file(s).
On Feb 26, 2010, at 4:26 PM, Doug Tangren wrote:

How do clients cache/register customer named map/reduce functions with the riak server?
 
-Doug Tangren
http://lessis.me


On Fri, Feb 26, 2010 at 4:57 PM, Preston Marshall <[hidden email]> wrote:
John, Riak allows you to cache mapreduce queries if you create them as named queries.  In a way, this kind of creates an index, because I think it works a bit like CouchDB where the caches are added to when there is new data.  I think Riak also has a similar query paradigm to CouchDB, which is dynamic data, static queries.  I may be completely wrong here, so feel free to correct me.

Thanks,
Preston

On Feb 26, 2010, at 3:53 PM, John Lynch wrote:

I am preparing to give a talk on Riak next week to a local Ruby user group here in San Diego, and wanted to get your thoughts on the future of Riak.  While in its current form it is awesome for loads of use cases, it falls short in the whole querying department, at least as it relates to building typical web applications. I get the map/reduce features, and the link features, but are there any plans to build in indexed query capabilities, a la MongoDB?  Mongo obviously has an easier time of this since it knows exactly what format your data is in (BSON). Or, is this something you would leave to third-parties to build as part of an ORM framework like Ripple, for example, which would have a better idea of the shape of the data and could create/maintain indexes accordingly...


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Future roadmap for indexed queries?

Rusty Klophaus
In reply to this post by John Lynch
Hi John,

In the near future, we are planning to add a pre-commit "hook", specified at a bucket level. This would provide the building blocks necessary to keep an index up to date when an object is stored in Riak. Eventually, I expect to see frameworks and other tools use the hook to allow easy indexing. 

Until that is in place, the best approach for querying depends on the shape of your data:

If you are searching for data through relations in a hierarchy, and you know the starting point of that hierarchy, then you should add tagged links to your objects and use linkwalking. If you need to search in a hierarchy, but need more flexibility than links and tags can provide, then you can use map/reduce functionality. By "starting point", I mean that you know the exact object or objects under which you would like to query.

If your queries are not relational/hierarchical and you don't know the starting point in advance, then your best approach would be to mimic the hook feature described above, and build up your index by hand in a separate Riak object. You could do this in your application when an object is stored (which requires extra hops to Riak), or you can use a background process to do this using list-keys, which means there will be some lag between when your data is stored and when the index is updated. (Keep in mind that list-keys can be an expensive operation, which is why it should be a background process.) 

Best,
Rusty



On Fri, Feb 26, 2010 at 4:53 PM, John Lynch <[hidden email]> wrote:
I am preparing to give a talk on Riak next week to a local Ruby user group here in San Diego, and wanted to get your thoughts on the future of Riak.  While in its current form it is awesome for loads of use cases, it falls short in the whole querying department, at least as it relates to building typical web applications. I get the map/reduce features, and the link features, but are there any plans to build in indexed query capabilities, a la MongoDB?  Mongo obviously has an easier time of this since it knows exactly what format your data is in (BSON). Or, is this something you would leave to third-parties to build as part of an ORM framework like Ripple, for example, which would have a better idea of the shape of the data and could create/maintain indexes accordingly...


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Future roadmap for indexed queries?

Rusty Klophaus
In reply to this post by Preston Marshall
Hi Preston,

Map/Reduce in Riak works a bit differently than Map/Reduce in Couch. In Riak, you can think of Map/Reduce as a mini-Hadoop job, where you pass in a set up input keys, and then define a chain of Map and Reduce phases that operate on that data.  

The Map phase runs a map function once for each input key/object. This happens in parallel, and the work is distributed across your cluster, so the function actually runs on the node where your data lives. The results of this Map function are cached in memory at an object level until the object is changed or deleted. The Map phase can return data or another list of keys. (As you correctly mentioned, only Map phases with "named" functions, not anonymous functions, are cached.)

The Reduce phase gathers the output from a Map phase and can either aggregate data in some way, or produce a new list of keys.

So to get back to your email, data is cached in Map/Reduce not when you have added new data, but rather after you query on that data. At that point, new queries that would touch the same data can use the cached results instead.

I sent another email out on this thread a few minutes ago that talks about pre-commit hooks. These would allow you to "pre-cache" query results when you add new data.

Best,
Rusty

On Fri, Feb 26, 2010 at 4:57 PM, Preston Marshall <[hidden email]> wrote:
John, Riak allows you to cache mapreduce queries if you create them as named queries.  In a way, this kind of creates an index, because I think it works a bit like CouchDB where the caches are added to when there is new data.  I think Riak also has a similar query paradigm to CouchDB, which is dynamic data, static queries.  I may be completely wrong here, so feel free to correct me.

Thanks,
Preston

On Feb 26, 2010, at 3:53 PM, John Lynch wrote:

I am preparing to give a talk on Riak next week to a local Ruby user group here in San Diego, and wanted to get your thoughts on the future of Riak.  While in its current form it is awesome for loads of use cases, it falls short in the whole querying department, at least as it relates to building typical web applications. I get the map/reduce features, and the link features, but are there any plans to build in indexed query capabilities, a la MongoDB?  Mongo obviously has an easier time of this since it knows exactly what format your data is in (BSON). Or, is this something you would leave to third-parties to build as part of an ORM framework like Ripple, for example, which would have a better idea of the shape of the data and could create/maintain indexes accordingly...


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Future roadmap for indexed queries?

Kevin Smith-5
In reply to this post by Preston Marshall
Preloading javascript functions will speed up their execution for two reasons:

1. Anonymous functions are slow because Riak has to a) check to see if they're defined and b) define them if they don't exist

2. Preloading occurs at Javascript VM startup thus moving all the parsing/compiling for the functions to before any queries are run

What this _does not_ do is provide any data caching. Internally Riak will cache data fetched during a map phase until either the data is updated, thus invalidating the cache, or the cache ejection algorithm runs (which is, I think once per minute, but that's off the top of my head). We are investigating replacing the map cache with a fixed size LRU cache which would provide better cache effects for commonly queried data. That doesn't exist now so the cache will get flushed on a regular interval.

--Kevin
On Feb 26, 2010, at 7:11 PM, Preston Marshall wrote:

> Look in the riak config file for {js_source_dir, "/tmp/js_source"}.  Modify this to your likings, and Riak will parse each function in the file(s).
> On Feb 26, 2010, at 4:26 PM, Doug Tangren wrote:
>
>> How do clients cache/register customer named map/reduce functions with the riak server?
>>  
>> -Doug Tangren
>> http://lessis.me
>>
>>
>> On Fri, Feb 26, 2010 at 4:57 PM, Preston Marshall <[hidden email]> wrote:
>> John, Riak allows you to cache mapreduce queries if you create them as named queries.  In a way, this kind of creates an index, because I think it works a bit like CouchDB where the caches are added to when there is new data.  I think Riak also has a similar query paradigm to CouchDB, which is dynamic data, static queries.  I may be completely wrong here, so feel free to correct me.
>>
>> Thanks,
>> Preston
>>
>> On Feb 26, 2010, at 3:53 PM, John Lynch wrote:
>>
>>> I am preparing to give a talk on Riak next week to a local Ruby user group here in San Diego, and wanted to get your thoughts on the future of Riak.  While in its current form it is awesome for loads of use cases, it falls short in the whole querying department, at least as it relates to building typical web applications. I get the map/reduce features, and the link features, but are there any plans to build in indexed query capabilities, a la MongoDB?  Mongo obviously has an easier time of this since it knows exactly what format your data is in (BSON). Or, is this something you would leave to third-parties to build as part of an ORM framework like Ripple, for example, which would have a better idea of the shape of the data and could create/maintain indexes accordingly...
>>>
>>>
>>> Regards,
>>>
>>> John Lynch, CTO
>>> Rigel Group, LLC
>>> [hidden email]
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com