to escape or not to escape?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

to escape or not to escape?

francisco treacy-2
I have a doubt regarding links and URI-escaping in Riak.

Say I create document 'test/first' with two links (notice %40 is
character @ uri-escaped):
 => 'test/second%40basho.com'
 => 'test/[hidden email]'

$> curl -X PUT -H "content-type:text/plain" \
  -H "Link: </riak/test/second%40basho.com>; riaktag=\"foo\",
</riak/test/[hidden email]>; riaktag=\"bar\"" \
  http://localhost:8098/riak/test/first --data "first"

$> curl -X PUT -H "content-type:text/plain"
http://localhost:8098/riak/test/second%40basho.com --data "second"

$> curl -X PUT -H "content-type:text/plain"
http://localhost:8098/riak/test/third@... --data "third"

When I walk first's links, I only obtain 'test/[hidden email]', not
'test/second%40basho.com'.

$> curl http://localhost:8098/riak/test/first/_,_,_

--IgVE1zwvyzEQq9FQ1HtOaFS8EBN
Content-Type: multipart/mixed; boundary=WOfmvASHvjIAAXdVoDNMnLViS2B

--WOfmvASHvjIAAXdVoDNMnLViS2B
X-Riak-Vclock: a85hYGBgzGDKBVIsTJsW1mcwJTLmsTJUZ10/ypcFAA==
Location: /riak/test/third%40basho.com
Content-Type: text/plain
Link: </riak/test>; rel="up"
Etag: 4gw3mCKxcnuyMAzLYMpJbl
Last-Modified: Wed, 11 Aug 2010 18:39:55 GMT

third
--WOfmvASHvjIAAXdVoDNMnLViS2B--

--IgVE1zwvyzEQq9FQ1HtOaFS8EBN--


If I update first's links, now with third being 'test/third%40basho.com':

$> curl -X PUT -H "content-type:text/plain" \
  -H "Link: </riak/test/second%40basho.com>; riaktag=\"foo\",
</riak/test/third%40basho.com>; riaktag=\"bar\"" \
  http://localhost:8098/riak/test/first --data "first"

I get the same results when I do the link-walking.
'test/second%40basho.com' is never retrieved. Doing an equivalent
map/reduce returns a {not_found: {  'test/[hidden email]' }}

This shows that in the process of escaping/unescaping there are cases
in which link-walking (and map/reduce for that matter) don't work.
Riak should probably never unescape, and assume all keys are already
escaped.

I've seen some work on Ripple in this direction, and that's the reason
my code is breaking after I updated. I'm also wondering how to deal
with this in riak-js.

Thanks,
Francisco

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: to escape or not to escape?

Dan Reverri
Hi Francisco,

You are correct; Riak is URL decoding Link headers in riak_kv_wm_raw:get_link_heads/2. I've opened bug 617 to address this issue:

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
[hidden email]


On Wed, Aug 11, 2010 at 12:31 PM, francisco treacy <[hidden email]> wrote:
I have a doubt regarding links and URI-escaping in Riak.

Say I create document 'test/first' with two links (notice %40 is
character @ uri-escaped):
 => 'test/second%40basho.com'
 => 'test/[hidden email]'

$> curl -X PUT -H "content-type:text/plain" \
 -H "Link: </riak/test/second%40basho.com>; riaktag=\"foo\",
</riak/test/[hidden email]>; riaktag=\"bar\"" \
 http://localhost:8098/riak/test/first --data "first"

$> curl -X PUT -H "content-type:text/plain"
http://localhost:8098/riak/test/second%40basho.com --data "second"

$> curl -X PUT -H "content-type:text/plain"
http://localhost:8098/riak/test/third@... --data "third"

When I walk first's links, I only obtain 'test/[hidden email]', not
'test/second%40basho.com'.

$> curl http://localhost:8098/riak/test/first/_,_,_

--IgVE1zwvyzEQq9FQ1HtOaFS8EBN
Content-Type: multipart/mixed; boundary=WOfmvASHvjIAAXdVoDNMnLViS2B

--WOfmvASHvjIAAXdVoDNMnLViS2B
X-Riak-Vclock: a85hYGBgzGDKBVIsTJsW1mcwJTLmsTJUZ10/ypcFAA==
Location: /riak/test/third%40basho.com
Content-Type: text/plain
Link: </riak/test>; rel="up"
Etag: 4gw3mCKxcnuyMAzLYMpJbl
Last-Modified: Wed, 11 Aug 2010 18:39:55 GMT

third
--WOfmvASHvjIAAXdVoDNMnLViS2B--

--IgVE1zwvyzEQq9FQ1HtOaFS8EBN--


If I update first's links, now with third being 'test/third%40basho.com':

$> curl -X PUT -H "content-type:text/plain" \
 -H "Link: </riak/test/second%40basho.com>; riaktag=\"foo\",
</riak/test/third%40basho.com>; riaktag=\"bar\"" \
 http://localhost:8098/riak/test/first --data "first"

I get the same results when I do the link-walking.
'test/second%40basho.com' is never retrieved. Doing an equivalent
map/reduce returns a {not_found: {  'test/[hidden email]' }}

This shows that in the process of escaping/unescaping there are cases
in which link-walking (and map/reduce for that matter) don't work.
Riak should probably never unescape, and assume all keys are already
escaped.

I've seen some work on Ripple in this direction, and that's the reason
my code is breaking after I updated. I'm also wondering how to deal
with this in riak-js.

Thanks,
Francisco

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: to escape or not to escape?

bryan-basho
Administrator
In reply to this post by francisco treacy-2
On Wed, Aug 11, 2010 at 3:31 PM, francisco treacy
<[hidden email]> wrote:
> I have a doubt regarding links and URI-escaping in Riak.
…snip…
> This shows that in the process of escaping/unescaping there are cases
> in which link-walking (and map/reduce for that matter) don't work.
> Riak should probably never unescape, and assume all keys are already
> escaped.

Hi, Francisco.  I've run into this myself in developing a recent
application.  I've been loathe to change the Riak behavior, for
worries of breaking existing client expectations.  The original
escaping was put in to deal with objects that had been created by the
native Erlang client, and therefore might have non-url-safe keys.  I'm
fairly certain I now agree that Riak should never mess with escaping,
though (and that non-url-safe keys should be treated as errors for the
HTTP interface).

In the meantime, a good workaround is to base64-encode keys used over
HTTP.  As long as you use alternate characters for + and / (like - and
_, called "base64url" on http://en.wikipedia.org/wiki/Base64), you end
up with urlencode(base64encode(Key)) == urldecode(base64encode(Key))
== base64encode(Key).

-Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: to escape or not to escape?

francisco treacy-2
2010/8/12 Bryan Fink <[hidden email]>:
> Hi, Francisco.  I've run into this myself in developing a recent
> application.  I've been loathe to change the Riak behavior, for
> worries of breaking existing client expectations.  The original
> escaping was put in to deal with objects that had been created by the
> native Erlang client, and therefore might have non-url-safe keys.  I'm
> fairly certain I now agree that Riak should never mess with escaping,
> though (and that non-url-safe keys should be treated as errors for the
> HTTP interface).

Yes, we agree Riak should never mess with escaping *and* non-url-safe
keys should be treated as errors for the HTTP interface (or for all,
what if you save a document via Erlang or the protobuf interface and
then you want to retrieve it with HTTP GET?).

I think not only it's important to fix
https://issues.basho.com/show_bug.cgi?id=617 , but also to clearly
state how client lib implementors should deal with this (basically,
that they should take care of all escaping in a uniform manner).

> In the meantime, a good workaround is to base64-encode keys used over
> HTTP.  As long as you use alternate characters for + and / (like - and
> _, called "base64url" on http://en.wikipedia.org/wiki/Base64), you end
> up with urlencode(base64encode(Key)) == urldecode(base64encode(Key))
> == base64encode(Key).

If I understand correctly this would involve changing my keys and atm
it's not really feasible in our app. As a workaround until this is
fixed, I patched Ripple so that it doesn't escape:
http://github.com/frank06/ripple/commit/baa326f0f6041f9ab2401e0606cd35533e878cf0
(works ok so far).

Thanks,
Francisco

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: to escape or not to escape?

francisco treacy-2
I recently incorporated a pull request into riak-js that involved URI
escaping for the HTTP client.

Shallow tests show that everything works fine, but once you start
messing with links...  you get bitten by this bug (#617). And I don't
want to resort to the hack of double-escaping links.

I think `mochiweb_util:unquote` should not be called there:
https://github.com/basho/riak_kv/blob/master/src/riak_kv_wm_raw.erl#L1217-1219

(Imagine having a Protobuf (or native Erlang) client adding a link to
['bucket', 'r4nd0m-%40b0e-key'], to later stumble onto not_founds
because ['bucket', 'r4nd0m-@b0e-key'] does not exist.)

Base64url-encoding is not an option for us, because we heavily rely on
meaningful keys.

Thanks,
Francisco


2010/8/13 francisco treacy <[hidden email]>:

> 2010/8/12 Bryan Fink <[hidden email]>:
>> Hi, Francisco.  I've run into this myself in developing a recent
>> application.  I've been loathe to change the Riak behavior, for
>> worries of breaking existing client expectations.  The original
>> escaping was put in to deal with objects that had been created by the
>> native Erlang client, and therefore might have non-url-safe keys.  I'm
>> fairly certain I now agree that Riak should never mess with escaping,
>> though (and that non-url-safe keys should be treated as errors for the
>> HTTP interface).
>
> Yes, we agree Riak should never mess with escaping *and* non-url-safe
> keys should be treated as errors for the HTTP interface (or for all,
> what if you save a document via Erlang or the protobuf interface and
> then you want to retrieve it with HTTP GET?).
>
> I think not only it's important to fix
> https://issues.basho.com/show_bug.cgi?id=617 , but also to clearly
> state how client lib implementors should deal with this (basically,
> that they should take care of all escaping in a uniform manner).
>
>> In the meantime, a good workaround is to base64-encode keys used over
>> HTTP.  As long as you use alternate characters for + and / (like - and
>> _, called "base64url" on http://en.wikipedia.org/wiki/Base64), you end
>> up with urlencode(base64encode(Key)) == urldecode(base64encode(Key))
>> == base64encode(Key).
>
> If I understand correctly this would involve changing my keys and atm
> it's not really feasible in our app. As a workaround until this is
> fixed, I patched Ripple so that it doesn't escape:
> http://github.com/frank06/ripple/commit/baa326f0f6041f9ab2401e0606cd35533e878cf0
> (works ok so far).
>
> Thanks,
> Francisco
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com