Slow performance on getting via 2i

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow performance on getting via 2i

Daniel Iwan
Hi
I'm using Riak Java client in 3-node configuration with protocols buf.
I have 3000 keys in a bucket, the other buckets are almost empty
When I grab all the keys in a bucket (discouraged listing)

like this
            Bucket b = _iclient.fetchBucket(BUCKET_NAME).execute();
            Iterator<String> iterator = b.keys().iterator();

I can get them in under 300ms, which is good

But when I do this:
            final Bucket b = _iclient.createBucket(BUCKET_NAME).execute();
            List<String> keys = b.fetchIndex(BinIndex.named("idx_key")).withValue(filter.toString()).execute();

It's 10 times slower, it takes about 2700ms.
All entries are tagged with the same index value and I'm getting all 3000 keys, but why is it so much slower? Am I doing something wrong? Is it some MapReduce going on somewhere that kills the performance?


Thanks
Daniel


Reply | Threaded
Open this post in threaded view
|

Re: Slow performance on getting via 2i

Daniel Iwan
Excuse my doubel-posting and replying to my own thread.
I found this in PBClientAdapter class in fetchIndex method

        final MapReduce mr = new IndexMapReduce(this, indexQuery);

        mr.addReducePhase(NamedErlangFunction.REDUCE_IDENTITY, Args.REDUCE_PHASE_ONLY_1);
        // only return the key, to match the http rest api
        mr.addReducePhase(new JSSourceFunction("function(v) { return v.map(function(e) { return e[1]; }); }"), Args.REDUCE_PHASE_ONLY_1);

This means it uses map reduce and JavaScript for protocol buffers.
Would that be the reason of slow performance with thousands of keys?
In my case it more performant to list all the keys than do 2i query.

I'm using java client 1.0.4 and riak cluster 1.1.1
Any feedback appreciated.

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: Slow performance on getting via 2i

Russell Brown-2
Hi Daniel,
Sorry for the slow reply.

On 12 Mar 2012, at 11:36, ivenhov wrote:

> Excuse my doubel-posting and replying to my own thread.
> I found this in PBClientAdapter class in fetchIndex method
>
>        final MapReduce mr = new IndexMapReduce(this, indexQuery);
>
>        mr.addReducePhase(NamedErlangFunction.REDUCE_IDENTITY,
> Args.REDUCE_PHASE_ONLY_1);
>        // only return the key, to match the http rest api
>        mr.addReducePhase(new JSSourceFunction("function(v) { return
> v.map(function(e) { return e[1]; }); }"), Args.REDUCE_PHASE_ONLY_1);
>
> This means it uses map reduce and JavaScript for protocol buffers.
> Would that be the reason of slow performance with thousands of keys?
> In my case it more performant to list all the keys than do 2i query.

You can measure if this *is* the problem if you are using Riak 1.1.*+ as an index MapReduce no longer requires a reduce identity call in Riak 1.1 onwards. Using the java client you can create an Index MapReduce without the reduce phases:

    IndexQuery = new BinIndexQuery(BinIndex.named("idx_key"), BUCKET_NAME, filter.toString());
    client.mapReduce(indexQuery).execute();

Your result will be a list of bucket/key pairs (different from the HTTP API's list of keys.) If that executes faster the problem is with the reduce phases, you could even try with just the first reduce phase, and then try with the second to discover if the JS phase is the issue.

The reason for the extra JS reduce phase was a (perhaps poor?) choice I made. In order to provide parity between the HTTP and PB API I made the choice to have the PB call return a list of Keys only (like the HTTP API.) I guess I could have iterated on the client to reformat the result, or had Jackson do it, or even just returned different output between the APIs…it is a hard choice and really the problem is that Riak has a special API for HTTP returning results in one format, and requires a MapReduce to get at indexes over protocol buffers.

Please let me know if executing an index MapReduce without reduce phases performs more in line with your expectations.

Cheers

Russell

>
> I'm using java client 1.0.4 and riak cluster 1.1.1
> Any feedback appreciated.
>
> Daniel
>
> --
> View this message in context: http://riak-users.197444.n3.nabble.com/Slow-performance-on-getting-via-2i-tp3812664p3818984.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Slow performance on getting via 2i

Daniel Iwan
Hi Russell

Thanks for reply.
I tried to use your example but I could not find class BinIndexQuery in riak client (I'm using 1.0.4, is there any newer version?)
I gave it a try with:

IndexQuery indexQuery = new BinValueQuery(BinIndex.named(BUCKET_NAME), BUCKET_NAME, bucketFilter.toString());
MapReduceResult res = _iclient.mapReduce(indexQuery).execute();          
Collection<String> resS = res.getResult(String.class);

which is probably something completely different since I got:
com.basho.riak.client.query.NoPhasesException
        at com.basho.riak.client.query.MapReduce.validate(MapReduce.java:90)
        at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:74)

I checked the github repository but could not find anything about BinIndexQuery.
Daniel.
Reply | Threaded
Open this post in threaded view
|

Re: Slow performance on getting via 2i

Russell Brown-2

On 19 Mar 2012, at 10:34, ivenhov wrote:

Hi Russell

Thanks for reply.
I tried to use your example but I could not find class BinIndexQuery in riak
client (I'm using 1.0.4, is there any newer version?)
No, you are correct, I replied from memory rather than checking the API first, sorry.

I gave it a try with:

IndexQuery indexQuery = new BinValueQuery(BinIndex.named(BUCKET_NAME),
BUCKET_NAME, bucketFilter.toString());
MapReduceResult res = _iclient.mapReduce(indexQuery).execute();           
Collection<String> resS = res.getResult(String.class);

which is probably something completely different since I got:
com.basho.riak.client.query.NoPhasesException
at com.basho.riak.client.query.MapReduce.validate(MapReduce.java:90)
at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:74)

Hmmm. I believe that validation should have been removed since Riak 1.1 MapReduce can be executed without phases. I've raised an issue for that[1]. Until it is fixed you can't try the example I suggested.

Well you can. Just use the raw client and provide a JSON string to it

    String jsonString = "{\"inputs\":{\"bucket\":\"mybucket\", \"index\":\"myindex_bin\", \"key\":\"mykey\"}, \"query\": []}";
    rawClient.mapReduce(new MapReduceSpec(jsonString));

If that is much faster then the reduce phases are to blame. Try adding back one, then the other. I have a suspicion that the JS reduce is the problem.

Cheers

Russell



I checked the github repository but could not find anything about
BinIndexQuery.
Daniel.


--
View this message in context: http://riak-users.197444.n3.nabble.com/Slow-performance-on-getting-via-2i-tp3812664p3838662.html
Sent from the Riak Users mailing list archive at Nabble.com.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Slow performance on getting via 2i

Paul Barry
One workaround for no-phases is to use Erlang identity reducer which is much quicker than js equivalent - for example

    IndexQuery query=new IntValueQuery(IntIndex.named(resetIndexName),next,0);
    MapReduceResult result=source.mapReduce(query)
            .addReducePhase(new NamedErlangFunction("riak_kv_mapreduce","reduce_identity"))
            .timeout(240000)
            .execute();

Returns bucket/key pairs.

On 19 Mar 2012, at 11:00, Russell Brown wrote:

>
> On 19 Mar 2012, at 10:34, ivenhov wrote:
>
>> Hi Russell
>>
>> Thanks for reply.
>> I tried to use your example but I could not find class BinIndexQuery in riak
>> client (I'm using 1.0.4, is there any newer version?)
> No, you are correct, I replied from memory rather than checking the API first, sorry.
>
>> I gave it a try with:
>>
>> IndexQuery indexQuery = new BinValueQuery(BinIndex.named(BUCKET_NAME),
>> BUCKET_NAME, bucketFilter.toString());
>> MapReduceResult res = _iclient.mapReduce(indexQuery).execute();          
>> Collection<String> resS = res.getResult(String.class);
>>
>> which is probably something completely different since I got:
>> com.basho.riak.client.query.NoPhasesException
>> at com.basho.riak.client.query.MapReduce.validate(MapReduce.java:90)
>> at com.basho.riak.client.query.MapReduce.execute(MapReduce.java:74)
>
> Hmmm. I believe that validation should have been removed since Riak 1.1 MapReduce can be executed without phases. I've raised an issue for that[1]. Until it is fixed you can't try the example I suggested.
>
> Well you can. Just use the raw client and provide a JSON string to it
>
>     String jsonString = "{\"inputs\":{\"bucket\":\"mybucket\", \"index\":\"myindex_bin\", \"key\":\"mykey\"}, \"query\": []}";
>     rawClient.mapReduce(new MapReduceSpec(jsonString));
>
> If that is much faster then the reduce phases are to blame. Try adding back one, then the other. I have a suspicion that the JS reduce is the problem.
>
> Cheers
>
> Russell
>
> [1] https://github.com/basho/riak-java-client/issues/113
>
>>
>> I checked the github repository but could not find anything about
>> BinIndexQuery.
>> Daniel.
>>
>>
>> --
>> View this message in context: http://riak-users.197444.n3.nabble.com/Slow-performance-on-getting-via-2i-tp3812664p3838662.html
>> Sent from the Riak Users mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com