Maps with multiple buckets

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Maps with multiple buckets

Bryce Verdier
Hi All,

I have a question concerning map-reduce. I have two buckets with
counters enabled that have similar keys to track two different metrics.
At the moment in order to combine these two datasets together I have to
make 2 different map-reduce queries and combine the data within the
client. I'm wondering if/how it might be possible to combine both of
these queries into one. I'm thinking that Links are a possibility, but
I'm not sure if it would or how viable a solution it would be.

Any and all advice is welcomed.

Thanks in advance,
Bryce

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Maps with multiple buckets

Jeremiah Peschka

The allowable inputs to an MR map phase include a list of bucket key pairs. If you know your keys in advance the problem is solved.

Can you describe a bit more about how you're using MR? Is this an ad hoc query? A predictable report? Time based?

---
sent from a tiny portion of the hive mind...
in this case, a phone

On Dec 17, 2013 4:51 PM, "Bryce Verdier" <[hidden email]> wrote:
Hi All,

I have a question concerning map-reduce. I have two buckets with counters enabled that have similar keys to track two different metrics. At the moment in order to combine these two datasets together I have to make 2 different map-reduce queries and combine the data within the client. I'm wondering if/how it might be possible to combine both of these queries into one. I'm thinking that Links are a possibility, but I'm not sure if it would or how viable a solution it would be.

Any and all advice is welcomed.

Thanks in advance,
Bryce

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Maps with multiple buckets

Bryce Verdier
Thank you for the quick response Jeremiah.

I didn't know that inputs could accept a list of buckets. I do believe that will solve my problem.

I'm currently using MR to grab a list all keys/counters and use the reduce phase to sort the keys by highest count. Because I'm using counters the backend MR quires are in erlang. The query would be ad hoc, I guess, because the client would be retrieving the data at frequent but not known or consist times.

Thanks again for the answer!
Bryce


On 12/17/2013 05:09 PM, Jeremiah Peschka wrote:

The allowable inputs to an MR map phase include a list of bucket key pairs. If you know your keys in advance the problem is solved.

Can you describe a bit more about how you're using MR? Is this an ad hoc query? A predictable report? Time based?

---
sent from a tiny portion of the hive mind...
in this case, a phone

On Dec 17, 2013 4:51 PM, "Bryce Verdier" <[hidden email]> wrote:
Hi All,

I have a question concerning map-reduce. I have two buckets with counters enabled that have similar keys to track two different metrics. At the moment in order to combine these two datasets together I have to make 2 different map-reduce queries and combine the data within the client. I'm wondering if/how it might be possible to combine both of these queries into one. I'm thinking that Links are a possibility, but I'm not sure if it would or how viable a solution it would be.

Any and all advice is welcomed.

Thanks in advance,
Bryce

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Maps with multiple buckets

Bryce Verdier
In reply to this post by Jeremiah Peschka
Upon thinking about things a little more, if anyone has information on how to do time series with counters(current, 5 seconds ago, 10 seconds ago, and 30 seconds ago), that would be a great thing to have for the project I'm doing.

Thanks again for the help!

Bryce


On 12/17/13 17:09, Jeremiah Peschka wrote:

The allowable inputs to an MR map phase include a list of bucket key pairs. If you know your keys in advance the problem is solved.

Can you describe a bit more about how you're using MR? Is this an ad hoc query? A predictable report? Time based?

---
sent from a tiny portion of the hive mind...
in this case, a phone

On Dec 17, 2013 4:51 PM, "Bryce Verdier" <[hidden email]> wrote:
Hi All,

I have a question concerning map-reduce. I have two buckets with counters enabled that have similar keys to track two different metrics. At the moment in order to combine these two datasets together I have to make 2 different map-reduce queries and combine the data within the client. I'm wondering if/how it might be possible to combine both of these queries into one. I'm thinking that Links are a possibility, but I'm not sure if it would or how viable a solution it would be.

Any and all advice is welcomed.

Thanks in advance,
Bryce

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Maps with multiple buckets

Bryce Verdier
In reply to this post by Jeremiah Peschka
So in playing with things a little bit, I don't think that this list of bucket-key pairs is going to work for me.

I'm using riak counters to keep tabs of various customers ID's as they travel through our system. So when Bob first shows up, he's seen by one set of servers. Adding 1 to the counter for Bob within some bucket. When he interacts with us, we'll see Bob again in another service. And thus add 1 to another counter for Bob within another bucket.

so:
buckets/initial/counters/Bob => 1
and:
buckets/interact/counters/Bob => 1

Currently I'm using 2 MR queries to get the list of counts for all customers from both buckets and combine these data sets within the client. I'm trying to see if its possible to do this within 1 query. Maybe return something like:
{"Bob": [1,1]}
in json.

I know that riak_kv_counter:value() requires a RiakObject to get the data. I the case of a MR I know the key and that its in another bucket. Is it possible to get the RiakObject based on those two items?




On 12/17/2013 05:09 PM, Jeremiah Peschka wrote:

The allowable inputs to an MR map phase include a list of bucket key pairs. If you know your keys in advance the problem is solved.

Can you describe a bit more about how you're using MR? Is this an ad hoc query? A predictable report? Time based?

---
sent from a tiny portion of the hive mind...
in this case, a phone

On Dec 17, 2013 4:51 PM, "Bryce Verdier" <[hidden email]> wrote:
Hi All,

I have a question concerning map-reduce. I have two buckets with counters enabled that have similar keys to track two different metrics. At the moment in order to combine these two datasets together I have to make 2 different map-reduce queries and combine the data within the client. I'm wondering if/how it might be possible to combine both of these queries into one. I'm thinking that Links are a possibility, but I'm not sure if it would or how viable a solution it would be.

Any and all advice is welcomed.

Thanks in advance,
Bryce

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Maps with multiple buckets

John Daily
Alex Moore and I provided some general time series advice and links on StackOverflow recently: http://stackoverflow.com/questions/19384686/what-is-the-most-efficient-way-to-store-time-series-in-riak-with-heavy-reads

Broadly speaking, issuing dynamic queries via MapReduce is going to be less desirable than building responses to the questions you’re going to ask later, as the data arrives. As you’ve already seen, writing MapReduce queries is rather painful, and Riak’s max performance/availability/scalability is achieved when serving key/value requests.

-John


On Dec 18, 2013, at 5:02 PM, Bryce Verdier <[hidden email]> wrote:

So in playing with things a little bit, I don't think that this list of bucket-key pairs is going to work for me.

I'm using riak counters to keep tabs of various customers ID's as they travel through our system. So when Bob first shows up, he's seen by one set of servers. Adding 1 to the counter for Bob within some bucket. When he interacts with us, we'll see Bob again in another service. And thus add 1 to another counter for Bob within another bucket.

so:
buckets/initial/counters/Bob => 1
and:
buckets/interact/counters/Bob => 1

Currently I'm using 2 MR queries to get the list of counts for all customers from both buckets and combine these data sets within the client. I'm trying to see if its possible to do this within 1 query. Maybe return something like:
{"Bob": [1,1]}
in json.

I know that riak_kv_counter:value() requires a RiakObject to get the data. I the case of a MR I know the key and that its in another bucket. Is it possible to get the RiakObject based on those two items?




On 12/17/2013 05:09 PM, Jeremiah Peschka wrote:

The allowable inputs to an MR map phase include a list of bucket key pairs. If you know your keys in advance the problem is solved.

Can you describe a bit more about how you're using MR? Is this an ad hoc query? A predictable report? Time based?

---
sent from a tiny portion of the hive mind...
in this case, a phone

On Dec 17, 2013 4:51 PM, "Bryce Verdier" <[hidden email]> wrote:
Hi All,

I have a question concerning map-reduce. I have two buckets with counters enabled that have similar keys to track two different metrics. At the moment in order to combine these two datasets together I have to make 2 different map-reduce queries and combine the data within the client. I'm wondering if/how it might be possible to combine both of these queries into one. I'm thinking that Links are a possibility, but I'm not sure if it would or how viable a solution it would be.

Any and all advice is welcomed.

Thanks in advance,
Bryce

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com