python map reduce and secondary indexes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

python map reduce and secondary indexes

David Montgomery
Hi,

Given that map reduce is the primary way of getting data out of riak, and i use python api, I am hard pressed to find any simple examples.  Not even on the officially supported riak python api.

Below is how I add a record to riak:

            id = """%s:%s:%s:%s:%s""" % (str(uuid4()),campaign_id,aid,da,country)
            worker_bucket = impression_bucket.new(id, data=qs)
            worker_bucket.add_index('field1_bin', campaign_id)
            worker_bucket.add_index('field2_bin', aid)
            worker_bucket.add_index('field3_bin', country)
            worker_bucket.add_index('field4_bin', da)

So....

If I want to get all records and sum up by country and date is in a date range then how?  I am ok with the reduce portion but not clear on the map portion.  How do I add a index for country=US and da>201207 and da<201212?

client  = riak.RiakClient(host='103.4.112.103')
    query = client.add('impressions')
    query.map('''
    function(value, keyData, arg) {
        var data = Riak.mapValuesJson(value)[0];
        var alt_key = data['hw'] + '_' + data['ssp'];
        var obj = {};
        obj[alt_key] = 1;
        return [ obj ];
       
    }''')



Thanks






_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: python map reduce and secondary indexes

bryan-basho
Administrator
On Sat, Dec 8, 2012 at 2:12 AM, David Montgomery
<[hidden email]> wrote:
> Given that map reduce is the primary way of getting data out of riak

Hi, David. MapReduce is not the primary way of getting data out of
Riak. Primary-key reads serve that purpose. Even for reading indexes,
there are built-in interfaces (/bucket/B/index/I/... on HTTP, and the
RpbIndexReq message on Protocol Buffers as of Riak 1.2). MapReduce is
an auxiliary interface that can be used effectively in some situations
to augment these other facilities.

> How do I add a index for country=US and da>201207 and da<201212?

This is a limitation of Riak's secondary indexes: you're only able to
query one index at a time. To perform the query you describe, you have
two options: create a unified index, or query by one index and filter
by the other.

To create a unified index, you would add another field, 'countryda',
and then set values there that are concatenations of the other fields,
like 'US-201209'. Then your query would be "countryda >= US-201207 and
countryda =< US-201212". Using the Python client:

    client.get_index('my_bucket', 'countryda_bin', 'US-201207', 'US-201212')

To query by one and filter by the other, you would look through all
the results of, for example, country=US, and then discard any entries
where da does not fall within your range. In MapReduce terms, that
might look like:

    client.index('my_bucket', 'country_bin', 'US')
          .map("function(v, kd, arg) {
                    // find the metadata value stored with the key
                    var da = v.values[0].metadata.index.da_bin;

                    // we passed the range in the arg so we can
                    // reuse this same function for any range
                    if (da > arg.min && da < arg.max)
                        return [v.key]; // in range - keep result
                    else
                        return []; // out of range - discard result
                    }",
                   {"arg":{"min":"201207", "max":"201212"}})
          .run()

Hope that helps,
Bryan

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Loading...