Getting a value: get vs map

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting a value: get vs map

Mikhail Sobolev
Hi,

(I looked at various places for the information, however I could not
find anything that would answer the question.  It's not completely ruled
out that not all places were checked though :))

I use PB erlang interface to access the database.  Given a bucket name
and a key, the value can easily be extracted using:

    {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
    Value = riakc_obj:get_value(Object)

Alternatively, a mapred (actually, just map) request could be issued:

    {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
        {Bucket, Key}
    ], [
        {map, {modfun, riak_kv, map_object_value}, none, true}
    ])

I would expect that the result is the same while in the second case, the
amount of data transferred to the client is smaller (which might be good
for certain situations).

So the [open] question is: are there any reasons for using the first
approach over the second?

--
Misha

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Getting a value: get vs map

Antonio Rohman Fernandez

MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )

Rohman

On Thu, 28 Jul 2011 23:43:06 +0400, [hidden email] wrote:

Hi,

(I looked at various places for the information, however I could not
find anything that would answer the question.  It's not completely ruled
out that not all places were checked though :))

I use PB erlang interface to access the database.  Given a bucket name
and a key, the value can easily be extracted using:

    {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
    Value = riakc_obj:get_value(Object)

Alternatively, a mapred (actually, just map) request could be issued:

    {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
        {Bucket, Key}
    ], [
        {map, {modfun, riak_kv, map_object_value}, none, true}
    ])

I would expect that the result is the same while in the second case, the
amount of data transferred to the client is smaller (which might be good
for certain situations).

So the [open] question is: are there any reasons for using the first
approach over the second?

--
Misha
--
line
logo   Antonio Rohman Fernandez
CEO, Founder & Lead Engineer
[hidden email]
  Projects
MaruBatsu.es
PupCloud.com
Wedding Album
line

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Getting a value: get vs map

Jeremiah Peschka
I would have suspected that an MR job where you supply a Bucket, Key pair would be just as fast as a Get request. Shows what I know.
---
Jeremiah Peschka
Founder, Brent Ozar PLF, LLC

On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:

> MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
>
> Rohman
>
> On Thu, 28 Jul 2011 23:43:06 +0400, [hidden email] wrote:
>
>> Hi,
>>
>> (I looked at various places for the information, however I could not
>> find anything that would answer the question.  It's not completely ruled
>> out that not all places were checked though :))
>>
>> I use PB erlang interface to access the database.  Given a bucket name
>> and a key, the value can easily be extracted using:
>>
>>     {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
>>     Value = riakc_obj:get_value(Object)
>>
>> Alternatively, a mapred (actually, just map) request could be issued:
>>
>>     {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
>>         {Bucket, Key}
>>     ], [
>>         {map, {modfun, riak_kv, map_object_value}, none, true}
>>     ])
>>
>> I would expect that the result is the same while in the second case, the
>> amount of data transferred to the client is smaller (which might be good
>> for certain situations).
>>
>> So the [open] question is: are there any reasons for using the first
>> approach over the second?
>>
>> --
>> Misha
>>
> --
>
> Antonio Rohman Fernandez
> CEO, Founder & Lead Engineer
> [hidden email] Projects
> MaruBatsu.es
> PupCloud.com
> Wedding Album
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Getting a value: get vs map

Jonathan Langevin
And it's a bit ironic that having data spread over more servers results in slower performance. Usually more servers = greater performance.
...black is white, up is down...


Jonathan Langevin
Systems Administrator

Loom Inc.
Wilmington, NC: (910) 241-0433 - [hidden email] - www.loomlearning.com - Skype: intel352



On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka <[hidden email]> wrote:
I would have suspected that an MR job where you supply a Bucket, Key pair would be just as fast as a Get request. Shows what I know.
---
Jeremiah Peschka
Founder, Brent Ozar PLF, LLC

On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:

> MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
>
> Rohman
>
> On Thu, 28 Jul 2011 23:43:06 +0400, [hidden email] wrote:
>
>> Hi,
>>
>> (I looked at various places for the information, however I could not
>> find anything that would answer the question.  It's not completely ruled
>> out that not all places were checked though :))
>>
>> I use PB erlang interface to access the database.  Given a bucket name
>> and a key, the value can easily be extracted using:
>>
>>     {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
>>     Value = riakc_obj:get_value(Object)
>>
>> Alternatively, a mapred (actually, just map) request could be issued:
>>
>>     {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
>>         {Bucket, Key}
>>     ], [
>>         {map, {modfun, riak_kv, map_object_value}, none, true}
>>     ])
>>
>> I would expect that the result is the same while in the second case, the
>> amount of data transferred to the client is smaller (which might be good
>> for certain situations).
>>
>> So the [open] question is: are there any reasons for using the first
>> approach over the second?
>>
>> --
>> Misha
>>
> --
>
>               Antonio Rohman Fernandez
> CEO, Founder & Lead Engineer
> [hidden email]               Projects
> MaruBatsu.es
> PupCloud.com
> Wedding Album
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Getting a value: get vs map

Justin Sheehy
In reply to this post by Jeremiah Peschka
Jeremiah,

You were essentially correct. A "targeted" MR does not have to search
for the data, and does not slow down with database size. It is a
bucket-sweeping MR that currently has that behavior.

-Justin



On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka
<[hidden email]> wrote:

> I would have suspected that an MR job where you supply a Bucket, Key pair would be just as fast as a Get request. Shows what I know.
> ---
> Jeremiah Peschka
> Founder, Brent Ozar PLF, LLC
>
> On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
>
>> MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
>>
>> Rohman
>>
>> On Thu, 28 Jul 2011 23:43:06 +0400, [hidden email] wrote:
>>
>>> Hi,
>>>
>>> (I looked at various places for the information, however I could not
>>> find anything that would answer the question.  It's not completely ruled
>>> out that not all places were checked though :))
>>>
>>> I use PB erlang interface to access the database.  Given a bucket name
>>> and a key, the value can easily be extracted using:
>>>
>>>     {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
>>>     Value = riakc_obj:get_value(Object)
>>>
>>> Alternatively, a mapred (actually, just map) request could be issued:
>>>
>>>     {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
>>>         {Bucket, Key}
>>>     ], [
>>>         {map, {modfun, riak_kv, map_object_value}, none, true}
>>>     ])
>>>
>>> I would expect that the result is the same while in the second case, the
>>> amount of data transferred to the client is smaller (which might be good
>>> for certain situations).
>>>
>>> So the [open] question is: are there any reasons for using the first
>>> approach over the second?
>>>
>>> --
>>> Misha
>>>
>> --
>>
>>               Antonio Rohman Fernandez
>> CEO, Founder & Lead Engineer
>> [hidden email]               Projects
>> MaruBatsu.es
>> PupCloud.com
>> Wedding Album
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Getting a value: get vs map

Sean Cribbs-2
A few things that should be mentioned as well:

1) MapReduce amounts to N=1, or reading only one replica. If you have divergent replicas (siblings, e.g.) on different notes, they might not appear in your MapReduce results.
2) MapReduce does not invoke read-repair, so divergent replicas will not converge.

On Fri, Jul 29, 2011 at 1:30 PM, Justin Sheehy <[hidden email]> wrote:
Jeremiah,

You were essentially correct. A "targeted" MR does not have to search
for the data, and does not slow down with database size. It is a
bucket-sweeping MR that currently has that behavior.

-Justin



On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka
<[hidden email]> wrote:
> I would have suspected that an MR job where you supply a Bucket, Key pair would be just as fast as a Get request. Shows what I know.
> ---
> Jeremiah Peschka
> Founder, Brent Ozar PLF, LLC
>
> On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
>
>> MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
>>
>> Rohman
>>
>> On Thu, 28 Jul 2011 23:43:06 +0400, [hidden email] wrote:
>>
>>> Hi,
>>>
>>> (I looked at various places for the information, however I could not
>>> find anything that would answer the question.  It's not completely ruled
>>> out that not all places were checked though :))
>>>
>>> I use PB erlang interface to access the database.  Given a bucket name
>>> and a key, the value can easily be extracted using:
>>>
>>>     {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
>>>     Value = riakc_obj:get_value(Object)
>>>
>>> Alternatively, a mapred (actually, just map) request could be issued:
>>>
>>>     {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
>>>         {Bucket, Key}
>>>     ], [
>>>         {map, {modfun, riak_kv, map_object_value}, none, true}
>>>     ])
>>>
>>> I would expect that the result is the same while in the second case, the
>>> amount of data transferred to the client is smaller (which might be good
>>> for certain situations).
>>>
>>> So the [open] question is: are there any reasons for using the first
>>> approach over the second?
>>>
>>> --
>>> Misha
>>>
>> --
>>
>>               Antonio Rohman Fernandez
>> CEO, Founder & Lead Engineer
>> [hidden email]               Projects
>> MaruBatsu.es
>> PupCloud.com
>> Wedding Album
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



--
Sean Cribbs <[hidden email]>
Developer Advocate
Basho Technologies, Inc.


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Getting a value: get vs map

Jonathan Langevin
In reply to this post by Justin Sheehy
That's reassuring to know then, thanks


Jonathan Langevin
Systems Administrator

Loom Inc.
Wilmington, NC: (910) 241-0433 - [hidden email] - www.loomlearning.com - Skype: intel352



On Fri, Jul 29, 2011 at 1:30 PM, Justin Sheehy <[hidden email]> wrote:
Jeremiah,

You were essentially correct. A "targeted" MR does not have to search
for the data, and does not slow down with database size. It is a
bucket-sweeping MR that currently has that behavior.

-Justin



On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka
<[hidden email]> wrote:
> I would have suspected that an MR job where you supply a Bucket, Key pair would be just as fast as a Get request. Shows what I know.
> ---
> Jeremiah Peschka
> Founder, Brent Ozar PLF, LLC
>
> On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
>
>> MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
>>
>> Rohman
>>
>> On Thu, 28 Jul 2011 23:43:06 +0400, [hidden email] wrote:
>>
>>> Hi,
>>>
>>> (I looked at various places for the information, however I could not
>>> find anything that would answer the question.  It's not completely ruled
>>> out that not all places were checked though :))
>>>
>>> I use PB erlang interface to access the database.  Given a bucket name
>>> and a key, the value can easily be extracted using:
>>>
>>>     {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
>>>     Value = riakc_obj:get_value(Object)
>>>
>>> Alternatively, a mapred (actually, just map) request could be issued:
>>>
>>>     {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
>>>         {Bucket, Key}
>>>     ], [
>>>         {map, {modfun, riak_kv, map_object_value}, none, true}
>>>     ])
>>>
>>> I would expect that the result is the same while in the second case, the
>>> amount of data transferred to the client is smaller (which might be good
>>> for certain situations).
>>>
>>> So the [open] question is: are there any reasons for using the first
>>> approach over the second?
>>>
>>> --
>>> Misha
>>>
>> --
>>
>>               Antonio Rohman Fernandez
>> CEO, Founder & Lead Engineer
>> [hidden email]               Projects
>> MaruBatsu.es
>> PupCloud.com
>> Wedding Album
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com