Limit to the number of links an object can have?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Limit to the number of links an object can have?

Magnus Enarsson
Hi,

Besides the scaling and availability I find the linkage feature of
Riak to be one of the most exciting ones. To me it is a really killer
feature as it makes it possible to model relations, and fetch objects
based on those relations in one single query!

However, the number of links an object can have seems to be somewhat
limited, and that make the feature much less usefull:

At https://wiki.basho.com/display/RIAK/Riak+compared+to+Neo4J I read
"In a pinch, this can substitute as a lightweight graph database, as
long as the number of links is kept reasonably low; think dozens, not
thousands."

On the other hand, at https://wiki.basho.com/display/RIAK/Links I read
"There is no artificial limit to the number of links an object can
have. But, as adding links to an object does increase that object's
size, the same guidelines that apply to your data should also apply to
your links: strike a balance between size and usability."

When I try to create an object with many links using Ripple Ruby gem,
I get a 400 error after adding a little more than 70-250 links
depending on key sizes. The limit seems to be when the Link header go
beyond 8200 or so characters.

I understand that many links will make it more expensive to handle the
object, but it is still better to follow many links from one object to
another than to have to do a full bucket scan with map/reduce. I would
not expect a tight limit on links on a Big Data database like Riak.

So what are really the limits with the number of links you can have
for one object? Are the limits that I encounter related to Ripple
rather than Riak? How can I go around this limit?

Kind regards,
Magnus

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Limit to the number of links an object can have?

John Lynch
Magnus,

I too am excited by Riak's link capabilities, and have spoken with a few folks at Riak and
discussed some changes that would greatly enhance the practicality of the link feature.
For one thing being able to run map/reduce jobs over objects using just their metadata and
not having to load/parse each object would be very useful in walking links and finding keys
in large data sets.

Also, having some kind of write analog to the HEAD command to be able to alter the
metadata on an object without having to PUT the entire object 's data would also be nice.

We can of course simulate some of this behavior ourselves by having a separate object
that stores just the metadata about our primary object, but it would be great to have the
functionality baked in to Riak.


Regards,

John Lynch, CTO
Rigel Group, LLC
[hidden email]


On Tue, Mar 23, 2010 at 9:04 AM, Magnus Enarsson <[hidden email]> wrote:
Hi,

Besides the scaling and availability I find the linkage feature of
Riak to be one of the most exciting ones. To me it is a really killer
feature as it makes it possible to model relations, and fetch objects
based on those relations in one single query!

However, the number of links an object can have seems to be somewhat
limited, and that make the feature much less usefull:

At https://wiki.basho.com/display/RIAK/Riak+compared+to+Neo4J I read
"In a pinch, this can substitute as a lightweight graph database, as
long as the number of links is kept reasonably low; think dozens, not
thousands."

On the other hand, at https://wiki.basho.com/display/RIAK/Links I read
"There is no artificial limit to the number of links an object can
have. But, as adding links to an object does increase that object's
size, the same guidelines that apply to your data should also apply to
your links: strike a balance between size and usability."

When I try to create an object with many links using Ripple Ruby gem,
I get a 400 error after adding a little more than 70-250 links
depending on key sizes. The limit seems to be when the Link header go
beyond 8200 or so characters.

I understand that many links will make it more expensive to handle the
object, but it is still better to follow many links from one object to
another than to have to do a full bucket scan with map/reduce. I would
not expect a tight limit on links on a Big Data database like Riak.

So what are really the limits with the number of links you can have
for one object? Are the limits that I encounter related to Ripple
rather than Riak? How can I go around this limit?

Kind regards,
Magnus

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Limit to the number of links an object can have?

Magnus Enarsson
In reply to this post by Magnus Enarsson
Sean,

I can confirm that it works when sending multiple Link headers. I
hacked a script together and created a object with 1000 links. On my
Mac Book with default setup and curl, it takes 0.02 seconds to fetch a
simple object without links, 0.2 seconds to fetch the object with 1000
links, and 0.4 seconds to fetch the 1000 objects using a link walk
from the center object (using /_,_,_).

I guess those numbers would be greatly improved when running on a
proper setup cluster.

So, I guess the conclusion is that Riak does handle thousands of links
after all, but the Ripple gem currently doesn't.

/ Magnus


On Tue, Mar 23, 2010 at 6:02 PM, Sean Cribbs <[hidden email]> wrote:

> Magnus,
>
> This goes down deep into the way mochiweb is implemented, in that it opens the TCP socket with a receive buffer of 8192 bytes.  It looks like if the header doesn't fit into that size, it will fail to parse the request.  This _might_ be solved by sending multiple Link headers instead of a single one.
>
> I don't have time today to test this theory, but it should be easy to do so.
>
> Sean Cribbs <[hidden email]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Mar 23, 2010, at 12:04 PM, Magnus Enarsson wrote:
>
>> Hi,
>>
>> Besides the scaling and availability I find the linkage feature of
>> Riak to be one of the most exciting ones. To me it is a really killer
>> feature as it makes it possible to model relations, and fetch objects
>> based on those relations in one single query!
>>
>> However, the number of links an object can have seems to be somewhat
>> limited, and that make the feature much less usefull:
>>
>> At https://wiki.basho.com/display/RIAK/Riak+compared+to+Neo4J I read
>> "In a pinch, this can substitute as a lightweight graph database, as
>> long as the number of links is kept reasonably low; think dozens, not
>> thousands."
>>
>> On the other hand, at https://wiki.basho.com/display/RIAK/Links I read
>> "There is no artificial limit to the number of links an object can
>> have. But, as adding links to an object does increase that object's
>> size, the same guidelines that apply to your data should also apply to
>> your links: strike a balance between size and usability."
>>
>> When I try to create an object with many links using Ripple Ruby gem,
>> I get a 400 error after adding a little more than 70-250 links
>> depending on key sizes. The limit seems to be when the Link header go
>> beyond 8200 or so characters.
>>
>> I understand that many links will make it more expensive to handle the
>> object, but it is still better to follow many links from one object to
>> another than to have to do a full bucket scan with map/reduce. I would
>> not expect a tight limit on links on a Big Data database like Riak.
>>
>> So what are really the limits with the number of links you can have
>> for one object? Are the limits that I encounter related to Ripple
>> rather than Riak? How can I go around this limit?
>>
>> Kind regards,
>> Magnus
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Limit to the number of links an object can have?

Sean Cribbs-2
Magnus,

Thanks for testing this out! I'll add an issue to Ripple's tracker, and we'll fix it for the next patch release.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Mar 23, 2010, at 2:14 PM, Magnus Enarsson wrote:

> Sean,
>
> I can confirm that it works when sending multiple Link headers. I
> hacked a script together and created a object with 1000 links. On my
> Mac Book with default setup and curl, it takes 0.02 seconds to fetch a
> simple object without links, 0.2 seconds to fetch the object with 1000
> links, and 0.4 seconds to fetch the 1000 objects using a link walk
> from the center object (using /_,_,_).
>
> I guess those numbers would be greatly improved when running on a
> proper setup cluster.
>
> So, I guess the conclusion is that Riak does handle thousands of links
> after all, but the Ripple gem currently doesn't.
>
> / Magnus
>
>
> On Tue, Mar 23, 2010 at 6:02 PM, Sean Cribbs <[hidden email]> wrote:
>> Magnus,
>>
>> This goes down deep into the way mochiweb is implemented, in that it opens the TCP socket with a receive buffer of 8192 bytes.  It looks like if the header doesn't fit into that size, it will fail to parse the request.  This _might_ be solved by sending multiple Link headers instead of a single one.
>>
>> I don't have time today to test this theory, but it should be easy to do so.
>>
>> Sean Cribbs <[hidden email]>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Mar 23, 2010, at 12:04 PM, Magnus Enarsson wrote:
>>
>>> Hi,
>>>
>>> Besides the scaling and availability I find the linkage feature of
>>> Riak to be one of the most exciting ones. To me it is a really killer
>>> feature as it makes it possible to model relations, and fetch objects
>>> based on those relations in one single query!
>>>
>>> However, the number of links an object can have seems to be somewhat
>>> limited, and that make the feature much less usefull:
>>>
>>> At https://wiki.basho.com/display/RIAK/Riak+compared+to+Neo4J I read
>>> "In a pinch, this can substitute as a lightweight graph database, as
>>> long as the number of links is kept reasonably low; think dozens, not
>>> thousands."
>>>
>>> On the other hand, at https://wiki.basho.com/display/RIAK/Links I read
>>> "There is no artificial limit to the number of links an object can
>>> have. But, as adding links to an object does increase that object's
>>> size, the same guidelines that apply to your data should also apply to
>>> your links: strike a balance between size and usability."
>>>
>>> When I try to create an object with many links using Ripple Ruby gem,
>>> I get a 400 error after adding a little more than 70-250 links
>>> depending on key sizes. The limit seems to be when the Link header go
>>> beyond 8200 or so characters.
>>>
>>> I understand that many links will make it more expensive to handle the
>>> object, but it is still better to follow many links from one object to
>>> another than to have to do a full bucket scan with map/reduce. I would
>>> not expect a tight limit on links on a Big Data database like Riak.
>>>
>>> So what are really the limits with the number of links you can have
>>> for one object? Are the limits that I encounter related to Ripple
>>> rather than Riak? How can I go around this limit?
>>>
>>> Kind regards,
>>> Magnus
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Limit to the number of links an object can have?

Magnus Enarsson
Just for the fun of it, I tried to add even more links to an object,
but I was unable to go much higher. It seems that the http interface
also has a limit around 1000 headers. When I store 1000 links, It will
not accept 1000 separate link headers, but if I store 10 links in each
link header (100 of them), it works alright.

At 5000 links it took 15 seconds to linkwalk to all 5000 objects the
first time, and on subsequent requests it takes about 2 seconds to
fetch all 5000. I guess there are some caching going on.

When I go past 5000 links it doesn't work at all anymore. I am unable
to get curl to make any request. Perhaps I have gone past the length
of a command in the shell or something.

I think Riak tackles the challenge quite well, but I guess that most
libraries for http communications will get problems handling that big
and that many headers.

/ Magnus


On Tue, Mar 23, 2010 at 7:18 PM, Sean Cribbs <[hidden email]> wrote:

> Magnus,
>
> Thanks for testing this out! I'll add an issue to Ripple's tracker, and we'll fix it for the next patch release.
>
> Sean Cribbs <[hidden email]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Mar 23, 2010, at 2:14 PM, Magnus Enarsson wrote:
>
>> Sean,
>>
>> I can confirm that it works when sending multiple Link headers. I
>> hacked a script together and created a object with 1000 links. On my
>> Mac Book with default setup and curl, it takes 0.02 seconds to fetch a
>> simple object without links, 0.2 seconds to fetch the object with 1000
>> links, and 0.4 seconds to fetch the 1000 objects using a link walk
>> from the center object (using /_,_,_).
>>
>> I guess those numbers would be greatly improved when running on a
>> proper setup cluster.
>>
>> So, I guess the conclusion is that Riak does handle thousands of links
>> after all, but the Ripple gem currently doesn't.
>>
>> / Magnus
>>
>>
>> On Tue, Mar 23, 2010 at 6:02 PM, Sean Cribbs <[hidden email]> wrote:
>>> Magnus,
>>>
>>> This goes down deep into the way mochiweb is implemented, in that it opens the TCP socket with a receive buffer of 8192 bytes.  It looks like if the header doesn't fit into that size, it will fail to parse the request.  This _might_ be solved by sending multiple Link headers instead of a single one.
>>>
>>> I don't have time today to test this theory, but it should be easy to do so.
>>>
>>> Sean Cribbs <[hidden email]>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On Mar 23, 2010, at 12:04 PM, Magnus Enarsson wrote:
>>>
>>>> Hi,
>>>>
>>>> Besides the scaling and availability I find the linkage feature of
>>>> Riak to be one of the most exciting ones. To me it is a really killer
>>>> feature as it makes it possible to model relations, and fetch objects
>>>> based on those relations in one single query!
>>>>
>>>> However, the number of links an object can have seems to be somewhat
>>>> limited, and that make the feature much less usefull:
>>>>
>>>> At https://wiki.basho.com/display/RIAK/Riak+compared+to+Neo4J I read
>>>> "In a pinch, this can substitute as a lightweight graph database, as
>>>> long as the number of links is kept reasonably low; think dozens, not
>>>> thousands."
>>>>
>>>> On the other hand, at https://wiki.basho.com/display/RIAK/Links I read
>>>> "There is no artificial limit to the number of links an object can
>>>> have. But, as adding links to an object does increase that object's
>>>> size, the same guidelines that apply to your data should also apply to
>>>> your links: strike a balance between size and usability."
>>>>
>>>> When I try to create an object with many links using Ripple Ruby gem,
>>>> I get a 400 error after adding a little more than 70-250 links
>>>> depending on key sizes. The limit seems to be when the Link header go
>>>> beyond 8200 or so characters.
>>>>
>>>> I understand that many links will make it more expensive to handle the
>>>> object, but it is still better to follow many links from one object to
>>>> another than to have to do a full bucket scan with map/reduce. I would
>>>> not expect a tight limit on links on a Big Data database like Riak.
>>>>
>>>> So what are really the limits with the number of links you can have
>>>> for one object? Are the limits that I encounter related to Ripple
>>>> rather than Riak? How can I go around this limit?
>>>>
>>>> Kind regards,
>>>> Magnus
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Limit to the number of links an object can have?

Sean Cribbs-2
There is indeed a 1000 header-count limit in mochiweb.  Ideally you would split the header up so that it's large enough to nearly fill the buffer but not get clipped. 8192-6 / around 40 chars per link = around 200 links per header.

Yes, there is a cache for map phases so that recently accessed objects will be available faster.  This is essential for jobs that have multiple map phases.

I believe that at the point where you have that many links it becomes necessary to consider other options, including intermediary objects or alternative ways of representing the relationship.  Even loading the original object will get slower simply because of the increased size.

Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Mar 24, 2010, at 4:09 AM, Magnus Enarsson wrote:

> Just for the fun of it, I tried to add even more links to an object,
> but I was unable to go much higher. It seems that the http interface
> also has a limit around 1000 headers. When I store 1000 links, It will
> not accept 1000 separate link headers, but if I store 10 links in each
> link header (100 of them), it works alright.
>
> At 5000 links it took 15 seconds to linkwalk to all 5000 objects the
> first time, and on subsequent requests it takes about 2 seconds to
> fetch all 5000. I guess there are some caching going on.
>
> When I go past 5000 links it doesn't work at all anymore. I am unable
> to get curl to make any request. Perhaps I have gone past the length
> of a command in the shell or something.
>
> I think Riak tackles the challenge quite well, but I guess that most
> libraries for http communications will get problems handling that big
> and that many headers.
>
> / Magnus
>
>
> On Tue, Mar 23, 2010 at 7:18 PM, Sean Cribbs <[hidden email]> wrote:
>> Magnus,
>>
>> Thanks for testing this out! I'll add an issue to Ripple's tracker, and we'll fix it for the next patch release.
>>
>> Sean Cribbs <[hidden email]>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Mar 23, 2010, at 2:14 PM, Magnus Enarsson wrote:
>>
>>> Sean,
>>>
>>> I can confirm that it works when sending multiple Link headers. I
>>> hacked a script together and created a object with 1000 links. On my
>>> Mac Book with default setup and curl, it takes 0.02 seconds to fetch a
>>> simple object without links, 0.2 seconds to fetch the object with 1000
>>> links, and 0.4 seconds to fetch the 1000 objects using a link walk
>>> from the center object (using /_,_,_).
>>>
>>> I guess those numbers would be greatly improved when running on a
>>> proper setup cluster.
>>>
>>> So, I guess the conclusion is that Riak does handle thousands of links
>>> after all, but the Ripple gem currently doesn't.
>>>
>>> / Magnus
>>>
>>>
>>> On Tue, Mar 23, 2010 at 6:02 PM, Sean Cribbs <[hidden email]> wrote:
>>>> Magnus,
>>>>
>>>> This goes down deep into the way mochiweb is implemented, in that it opens the TCP socket with a receive buffer of 8192 bytes.  It looks like if the header doesn't fit into that size, it will fail to parse the request.  This _might_ be solved by sending multiple Link headers instead of a single one.
>>>>
>>>> I don't have time today to test this theory, but it should be easy to do so.
>>>>
>>>> Sean Cribbs <[hidden email]>
>>>> Developer Advocate
>>>> Basho Technologies, Inc.
>>>> http://basho.com/
>>>>
>>>> On Mar 23, 2010, at 12:04 PM, Magnus Enarsson wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Besides the scaling and availability I find the linkage feature of
>>>>> Riak to be one of the most exciting ones. To me it is a really killer
>>>>> feature as it makes it possible to model relations, and fetch objects
>>>>> based on those relations in one single query!
>>>>>
>>>>> However, the number of links an object can have seems to be somewhat
>>>>> limited, and that make the feature much less usefull:
>>>>>
>>>>> At https://wiki.basho.com/display/RIAK/Riak+compared+to+Neo4J I read
>>>>> "In a pinch, this can substitute as a lightweight graph database, as
>>>>> long as the number of links is kept reasonably low; think dozens, not
>>>>> thousands."
>>>>>
>>>>> On the other hand, at https://wiki.basho.com/display/RIAK/Links I read
>>>>> "There is no artificial limit to the number of links an object can
>>>>> have. But, as adding links to an object does increase that object's
>>>>> size, the same guidelines that apply to your data should also apply to
>>>>> your links: strike a balance between size and usability."
>>>>>
>>>>> When I try to create an object with many links using Ripple Ruby gem,
>>>>> I get a 400 error after adding a little more than 70-250 links
>>>>> depending on key sizes. The limit seems to be when the Link header go
>>>>> beyond 8200 or so characters.
>>>>>
>>>>> I understand that many links will make it more expensive to handle the
>>>>> object, but it is still better to follow many links from one object to
>>>>> another than to have to do a full bucket scan with map/reduce. I would
>>>>> not expect a tight limit on links on a Big Data database like Riak.
>>>>>
>>>>> So what are really the limits with the number of links you can have
>>>>> for one object? Are the limits that I encounter related to Ripple
>>>>> rather than Riak? How can I go around this limit?
>>>>>
>>>>> Kind regards,
>>>>> Magnus
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> [hidden email]
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com