Simple performance question

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple performance question

Victor 'Zverok' Shepelev
Hi all.

Trying to test riak performance, I've stored 10'000 values
(JSON-encoded objects) in one bucket, then trying map-reduce request
to this bucket.

map phase is just "Riak.mapValuesJson"

reduce phase is like
---
    function(values, a){
        minKey = 'ZZZZ'; minTask = null;
        for(i = 0; i < values.length; ++i){
            val = values[i]
            if(val.scheduled < minKey){
                minKey = val.scheduled;
                minTask = val;
            }
        }
        return [minTask];
    }
---

It's like just: find task with minimal "scheduled" field.

Then, on bucket with 10'000 values, I have this request performing
~1min (through Unix time) on Celeron 2.6GHz 1Gb.
Is this result expected or am I doing something wrong?

Also, sometimes I obtain just {"error":"timeout"} instead of result.
Is this expected?

Thanks.

V.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Simple performance question

Kevin Smith-5
Victor -

You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:

1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:

function my_reduce (values, a) {
  minKey = 'ZZZZ'; minTask = null;
  for(i = 0; i < values.length; ++i) {
    val = values[i]
    if(val.scheduled < minKey){
      minKey = val.scheduled;
      minTask = val;
    }
  }
  return [minTask];
 }

2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.

3. Restart Riak so it picks up the configuration change.

4. Modify your job description to use the named function.

5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.

--Kevin

On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:

> Hi all.
>
> Trying to test riak performance, I've stored 10'000 values
> (JSON-encoded objects) in one bucket, then trying map-reduce request
> to this bucket.
>
> map phase is just "Riak.mapValuesJson"
>
> reduce phase is like
> ---
>    function(values, a){
>        minKey = 'ZZZZ'; minTask = null;
>        for(i = 0; i < values.length; ++i){
>            val = values[i]
>            if(val.scheduled < minKey){
>                minKey = val.scheduled;
>                minTask = val;
>            }
>        }
>        return [minTask];
>    }
> ---
>
> It's like just: find task with minimal "scheduled" field.
>
> Then, on bucket with 10'000 values, I have this request performing
> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
> Is this result expected or am I doing something wrong?
>
> Also, sometimes I obtain just {"error":"timeout"} instead of result.
> Is this expected?
>
> Thanks.
>
> V.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Simple performance question

Victor 'Zverok' Shepelev
Thanks Kevin,

But this seem not help (still ~54sec of real time).

V.

2010/2/23 Kevin Smith <[hidden email]>:

> Victor -
>
> You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:
>
> 1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:
>
> function my_reduce (values, a) {
>  minKey = 'ZZZZ'; minTask = null;
>  for(i = 0; i < values.length; ++i) {
>    val = values[i]
>    if(val.scheduled < minKey){
>      minKey = val.scheduled;
>      minTask = val;
>    }
>  }
>  return [minTask];
>  }
>
> 2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.
>
> 3. Restart Riak so it picks up the configuration change.
>
> 4. Modify your job description to use the named function.
>
> 5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.
>
> --Kevin
>
> On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:
>
>> Hi all.
>>
>> Trying to test riak performance, I've stored 10'000 values
>> (JSON-encoded objects) in one bucket, then trying map-reduce request
>> to this bucket.
>>
>> map phase is just "Riak.mapValuesJson"
>>
>> reduce phase is like
>> ---
>>    function(values, a){
>>        minKey = 'ZZZZ'; minTask = null;
>>        for(i = 0; i < values.length; ++i){
>>            val = values[i]
>>            if(val.scheduled < minKey){
>>                minKey = val.scheduled;
>>                minTask = val;
>>            }
>>        }
>>        return [minTask];
>>    }
>> ---
>>
>> It's like just: find task with minimal "scheduled" field.
>>
>> Then, on bucket with 10'000 values, I have this request performing
>> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
>> Is this result expected or am I doing something wrong?
>>
>> Also, sometimes I obtain just {"error":"timeout"} instead of result.
>> Is this expected?
>>
>> Thanks.
>>
>> V.
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Simple performance question

Kevin Smith-5
Another issue in 0.8 is reduce phases are bottlenecks since they are executed in a serially.  You can work around this, to a certain degree, by moving more work into the map phases which execute in parallel.

For example, you could modify your map phase to return [val.scheduled] directly instead of doing it inside of a loop in the reduce phase. If your data is sortable your could then replace your for loop with:

var sortedValues = values.sort();
return [sortedValues[0]];


--Kevin

P.S. Reduce phases will be parallelized in certain use cases starting in the next release.

On Feb 23, 2010, at 11:04 AM, Victor 'Zverok' Shepelev wrote:

> Thanks Kevin,
>
> But this seem not help (still ~54sec of real time).
>
> V.
>
> 2010/2/23 Kevin Smith <[hidden email]>:
>> Victor -
>>
>> You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:
>>
>> 1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:
>>
>> function my_reduce (values, a) {
>>  minKey = 'ZZZZ'; minTask = null;
>>  for(i = 0; i < values.length; ++i) {
>>    val = values[i]
>>    if(val.scheduled < minKey){
>>      minKey = val.scheduled;
>>      minTask = val;
>>    }
>>  }
>>  return [minTask];
>>  }
>>
>> 2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.
>>
>> 3. Restart Riak so it picks up the configuration change.
>>
>> 4. Modify your job description to use the named function.
>>
>> 5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.
>>
>> --Kevin
>>
>> On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:
>>
>>> Hi all.
>>>
>>> Trying to test riak performance, I've stored 10'000 values
>>> (JSON-encoded objects) in one bucket, then trying map-reduce request
>>> to this bucket.
>>>
>>> map phase is just "Riak.mapValuesJson"
>>>
>>> reduce phase is like
>>> ---
>>>    function(values, a){
>>>        minKey = 'ZZZZ'; minTask = null;
>>>        for(i = 0; i < values.length; ++i){
>>>            val = values[i]
>>>            if(val.scheduled < minKey){
>>>                minKey = val.scheduled;
>>>                minTask = val;
>>>            }
>>>        }
>>>        return [minTask];
>>>    }
>>> ---
>>>
>>> It's like just: find task with minimal "scheduled" field.
>>>
>>> Then, on bucket with 10'000 values, I have this request performing
>>> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
>>> Is this result expected or am I doing something wrong?
>>>
>>> Also, sometimes I obtain just {"error":"timeout"} instead of result.
>>> Is this expected?
>>>
>>> Thanks.
>>>
>>> V.
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Simple performance question

Preston Marshall
So would his query be fast if it didn't do a reduce?  Based on my just playing around on my laptop, it seems kindof slow.
On Feb 23, 2010, at 10:15 AM, Kevin Smith wrote:

> Another issue in 0.8 is reduce phases are bottlenecks since they are executed in a serially.  You can work around this, to a certain degree, by moving more work into the map phases which execute in parallel.
>
> For example, you could modify your map phase to return [val.scheduled] directly instead of doing it inside of a loop in the reduce phase. If your data is sortable your could then replace your for loop with:
>
> var sortedValues = values.sort();
> return [sortedValues[0]];
>
>
> --Kevin
>
> P.S. Reduce phases will be parallelized in certain use cases starting in the next release.
>
> On Feb 23, 2010, at 11:04 AM, Victor 'Zverok' Shepelev wrote:
>
>> Thanks Kevin,
>>
>> But this seem not help (still ~54sec of real time).
>>
>> V.
>>
>> 2010/2/23 Kevin Smith <[hidden email]>:
>>> Victor -
>>>
>>> You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:
>>>
>>> 1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:
>>>
>>> function my_reduce (values, a) {
>>> minKey = 'ZZZZ'; minTask = null;
>>> for(i = 0; i < values.length; ++i) {
>>>   val = values[i]
>>>   if(val.scheduled < minKey){
>>>     minKey = val.scheduled;
>>>     minTask = val;
>>>   }
>>> }
>>> return [minTask];
>>> }
>>>
>>> 2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.
>>>
>>> 3. Restart Riak so it picks up the configuration change.
>>>
>>> 4. Modify your job description to use the named function.
>>>
>>> 5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.
>>>
>>> --Kevin
>>>
>>> On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:
>>>
>>>> Hi all.
>>>>
>>>> Trying to test riak performance, I've stored 10'000 values
>>>> (JSON-encoded objects) in one bucket, then trying map-reduce request
>>>> to this bucket.
>>>>
>>>> map phase is just "Riak.mapValuesJson"
>>>>
>>>> reduce phase is like
>>>> ---
>>>>   function(values, a){
>>>>       minKey = 'ZZZZ'; minTask = null;
>>>>       for(i = 0; i < values.length; ++i){
>>>>           val = values[i]
>>>>           if(val.scheduled < minKey){
>>>>               minKey = val.scheduled;
>>>>               minTask = val;
>>>>           }
>>>>       }
>>>>       return [minTask];
>>>>   }
>>>> ---
>>>>
>>>> It's like just: find task with minimal "scheduled" field.
>>>>
>>>> Then, on bucket with 10'000 values, I have this request performing
>>>> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
>>>> Is this result expected or am I doing something wrong?
>>>>
>>>> Also, sometimes I obtain just {"error":"timeout"} instead of result.
>>>> Is this expected?
>>>>
>>>> Thanks.
>>>>
>>>> V.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Simple performance question

Victor 'Zverok' Shepelev
In reply to this post by Kevin Smith-5
OK, I understand.

BTW, just tryed to play around, and now ALL of my queries seems to be
dead slow (even to small "toy" bucket with 10 objects). Could it be
because of 2 large buckets existance? (one of them has 10'000 and
another 100'000 objects)

V.

2010/2/23 Kevin Smith <[hidden email]>:

> Another issue in 0.8 is reduce phases are bottlenecks since they are executed in a serially.  You can work around this, to a certain degree, by moving more work into the map phases which execute in parallel.
>
> For example, you could modify your map phase to return [val.scheduled] directly instead of doing it inside of a loop in the reduce phase. If your data is sortable your could then replace your for loop with:
>
> var sortedValues = values.sort();
> return [sortedValues[0]];
>
>
> --Kevin
>
> P.S. Reduce phases will be parallelized in certain use cases starting in the next release.
>
> On Feb 23, 2010, at 11:04 AM, Victor 'Zverok' Shepelev wrote:
>
>> Thanks Kevin,
>>
>> But this seem not help (still ~54sec of real time).
>>
>> V.
>>
>> 2010/2/23 Kevin Smith <[hidden email]>:
>>> Victor -
>>>
>>> You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:
>>>
>>> 1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:
>>>
>>> function my_reduce (values, a) {
>>>  minKey = 'ZZZZ'; minTask = null;
>>>  for(i = 0; i < values.length; ++i) {
>>>    val = values[i]
>>>    if(val.scheduled < minKey){
>>>      minKey = val.scheduled;
>>>      minTask = val;
>>>    }
>>>  }
>>>  return [minTask];
>>>  }
>>>
>>> 2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.
>>>
>>> 3. Restart Riak so it picks up the configuration change.
>>>
>>> 4. Modify your job description to use the named function.
>>>
>>> 5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.
>>>
>>> --Kevin
>>>
>>> On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:
>>>
>>>> Hi all.
>>>>
>>>> Trying to test riak performance, I've stored 10'000 values
>>>> (JSON-encoded objects) in one bucket, then trying map-reduce request
>>>> to this bucket.
>>>>
>>>> map phase is just "Riak.mapValuesJson"
>>>>
>>>> reduce phase is like
>>>> ---
>>>>    function(values, a){
>>>>        minKey = 'ZZZZ'; minTask = null;
>>>>        for(i = 0; i < values.length; ++i){
>>>>            val = values[i]
>>>>            if(val.scheduled < minKey){
>>>>                minKey = val.scheduled;
>>>>                minTask = val;
>>>>            }
>>>>        }
>>>>        return [minTask];
>>>>    }
>>>> ---
>>>>
>>>> It's like just: find task with minimal "scheduled" field.
>>>>
>>>> Then, on bucket with 10'000 values, I have this request performing
>>>> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
>>>> Is this result expected or am I doing something wrong?
>>>>
>>>> Also, sometimes I obtain just {"error":"timeout"} instead of result.
>>>> Is this expected?
>>>>
>>>> Thanks.
>>>>
>>>> V.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Simple performance question

Kevin Smith-5
In reply to this post by Preston Marshall
A map only query should be significantly faster since it runs in parallel. Using either named Javascript functions or Module:FunctionName Erlang function refs, aka modfuns, will be the absolute fastest.

--Kevin
On Feb 23, 2010, at 11:26 AM, Preston Marshall wrote:

> So would his query be fast if it didn't do a reduce?  Based on my just playing around on my laptop, it seems kindof slow.
> On Feb 23, 2010, at 10:15 AM, Kevin Smith wrote:
>
>> Another issue in 0.8 is reduce phases are bottlenecks since they are executed in a serially.  You can work around this, to a certain degree, by moving more work into the map phases which execute in parallel.
>>
>> For example, you could modify your map phase to return [val.scheduled] directly instead of doing it inside of a loop in the reduce phase. If your data is sortable your could then replace your for loop with:
>>
>> var sortedValues = values.sort();
>> return [sortedValues[0]];
>>
>>
>> --Kevin
>>
>> P.S. Reduce phases will be parallelized in certain use cases starting in the next release.
>>
>> On Feb 23, 2010, at 11:04 AM, Victor 'Zverok' Shepelev wrote:
>>
>>> Thanks Kevin,
>>>
>>> But this seem not help (still ~54sec of real time).
>>>
>>> V.
>>>
>>> 2010/2/23 Kevin Smith <[hidden email]>:
>>>> Victor -
>>>>
>>>> You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:
>>>>
>>>> 1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:
>>>>
>>>> function my_reduce (values, a) {
>>>> minKey = 'ZZZZ'; minTask = null;
>>>> for(i = 0; i < values.length; ++i) {
>>>>  val = values[i]
>>>>  if(val.scheduled < minKey){
>>>>    minKey = val.scheduled;
>>>>    minTask = val;
>>>>  }
>>>> }
>>>> return [minTask];
>>>> }
>>>>
>>>> 2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.
>>>>
>>>> 3. Restart Riak so it picks up the configuration change.
>>>>
>>>> 4. Modify your job description to use the named function.
>>>>
>>>> 5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.
>>>>
>>>> --Kevin
>>>>
>>>> On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:
>>>>
>>>>> Hi all.
>>>>>
>>>>> Trying to test riak performance, I've stored 10'000 values
>>>>> (JSON-encoded objects) in one bucket, then trying map-reduce request
>>>>> to this bucket.
>>>>>
>>>>> map phase is just "Riak.mapValuesJson"
>>>>>
>>>>> reduce phase is like
>>>>> ---
>>>>>  function(values, a){
>>>>>      minKey = 'ZZZZ'; minTask = null;
>>>>>      for(i = 0; i < values.length; ++i){
>>>>>          val = values[i]
>>>>>          if(val.scheduled < minKey){
>>>>>              minKey = val.scheduled;
>>>>>              minTask = val;
>>>>>          }
>>>>>      }
>>>>>      return [minTask];
>>>>>  }
>>>>> ---
>>>>>
>>>>> It's like just: find task with minimal "scheduled" field.
>>>>>
>>>>> Then, on bucket with 10'000 values, I have this request performing
>>>>> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
>>>>> Is this result expected or am I doing something wrong?
>>>>>
>>>>> Also, sometimes I obtain just {"error":"timeout"} instead of result.
>>>>> Is this expected?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> V.
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> [hidden email]
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Simple performance question

Kevin Smith-5
In reply to this post by Victor 'Zverok' Shepelev
Victor --

The presence of one or two large buckets shouldn't impact overall query performance at all. There are a number of other factors which can:

1) Object size -- Each object has to be translated to and from JSON during map/reduce processing. Obviously larger objects will take longer to convert than smaller ones.

2) Backend selection - Riak defaults to using dets as the backend store. It is "good enough" for development purposes but otherwise kind of pokey. Innostore (http://hg.basho.com/innostore/src/tip/README) is superior to the dets backend. I highly recommend using Innostore if performance is a concern. Unfortunately, due to licensing issues, we can't bundle Innostore w/Riak hence the separate download. The README does a good job of explaining how to build and use it, if you decide to try it out.

3) Preferring named functions (Erlang and Javascript) to unnamed functions -- For reasons I've already stated.

4) Minimizing work performed in reduce phases -- This is a result of the serial nature of reduce phases and is addressed in the next release.

You said you've already converted to a named function and saw no real improvement. That seems to indicate the problem is not the slow invocation of anonymous functions. Have you tried refactoring your map and reduce phases as I suggested? If you have and are still seeing slow performance, I'd seriously recommend giving Innostore a try.

--Kevin
On Feb 23, 2010, at 11:32 AM, Victor 'Zverok' Shepelev wrote:

> OK, I understand.
>
> BTW, just tryed to play around, and now ALL of my queries seems to be
> dead slow (even to small "toy" bucket with 10 objects). Could it be
> because of 2 large buckets existance? (one of them has 10'000 and
> another 100'000 objects)
>
> V.
>
> 2010/2/23 Kevin Smith <[hidden email]>:
>> Another issue in 0.8 is reduce phases are bottlenecks since they are executed in a serially.  You can work around this, to a certain degree, by moving more work into the map phases which execute in parallel.
>>
>> For example, you could modify your map phase to return [val.scheduled] directly instead of doing it inside of a loop in the reduce phase. If your data is sortable your could then replace your for loop with:
>>
>> var sortedValues = values.sort();
>> return [sortedValues[0]];
>>
>>
>> --Kevin
>>
>> P.S. Reduce phases will be parallelized in certain use cases starting in the next release.
>>
>> On Feb 23, 2010, at 11:04 AM, Victor 'Zverok' Shepelev wrote:
>>
>>> Thanks Kevin,
>>>
>>> But this seem not help (still ~54sec of real time).
>>>
>>> V.
>>>
>>> 2010/2/23 Kevin Smith <[hidden email]>:
>>>> Victor -
>>>>
>>>> You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:
>>>>
>>>> 1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:
>>>>
>>>> function my_reduce (values, a) {
>>>>  minKey = 'ZZZZ'; minTask = null;
>>>>  for(i = 0; i < values.length; ++i) {
>>>>    val = values[i]
>>>>    if(val.scheduled < minKey){
>>>>      minKey = val.scheduled;
>>>>      minTask = val;
>>>>    }
>>>>  }
>>>>  return [minTask];
>>>>  }
>>>>
>>>> 2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.
>>>>
>>>> 3. Restart Riak so it picks up the configuration change.
>>>>
>>>> 4. Modify your job description to use the named function.
>>>>
>>>> 5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.
>>>>
>>>> --Kevin
>>>>
>>>> On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:
>>>>
>>>>> Hi all.
>>>>>
>>>>> Trying to test riak performance, I've stored 10'000 values
>>>>> (JSON-encoded objects) in one bucket, then trying map-reduce request
>>>>> to this bucket.
>>>>>
>>>>> map phase is just "Riak.mapValuesJson"
>>>>>
>>>>> reduce phase is like
>>>>> ---
>>>>>    function(values, a){
>>>>>        minKey = 'ZZZZ'; minTask = null;
>>>>>        for(i = 0; i < values.length; ++i){
>>>>>            val = values[i]
>>>>>            if(val.scheduled < minKey){
>>>>>                minKey = val.scheduled;
>>>>>                minTask = val;
>>>>>            }
>>>>>        }
>>>>>        return [minTask];
>>>>>    }
>>>>> ---
>>>>>
>>>>> It's like just: find task with minimal "scheduled" field.
>>>>>
>>>>> Then, on bucket with 10'000 values, I have this request performing
>>>>> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
>>>>> Is this result expected or am I doing something wrong?
>>>>>
>>>>> Also, sometimes I obtain just {"error":"timeout"} instead of result.
>>>>> Is this expected?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> V.
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> [hidden email]
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>
>>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com