Simple performance question

Kevin Smith ksmith at basho.com
Tue Feb 23 11:15:28 EST 2010


Another issue in 0.8 is reduce phases are bottlenecks since they are executed in a serially.  You can work around this, to a certain degree, by moving more work into the map phases which execute in parallel.

For example, you could modify your map phase to return [val.scheduled] directly instead of doing it inside of a loop in the reduce phase. If your data is sortable your could then replace your for loop with:

var sortedValues = values.sort();
return [sortedValues[0]];


--Kevin

P.S. Reduce phases will be parallelized in certain use cases starting in the next release.

On Feb 23, 2010, at 11:04 AM, Victor 'Zverok' Shepelev wrote:

> Thanks Kevin,
> 
> But this seem not help (still ~54sec of real time).
> 
> V.
> 
> 2010/2/23 Kevin Smith <ksmith at basho.com>:
>> Victor -
>> 
>> You're running into the slow performance of anonymous Javascript functions in the current release of Riak. For now, anonymous functions should only be used for prototyping and development on smallish amounts of data. You make your job run faster by converting the anonymous function to a named one. The conversion process is pretty painless:
>> 
>> 1. Create a named function for your reduce phase and store it in a file ending in ".js". For example:
>> 
>> function my_reduce (values, a) {
>>  minKey = 'ZZZZ'; minTask = null;
>>  for(i = 0; i < values.length; ++i) {
>>    val = values[i]
>>    if(val.scheduled < minKey){
>>      minKey = val.scheduled;
>>      minTask = val;
>>    }
>>  }
>>  return [minTask];
>>  }
>> 
>> 2. Uncomment the js_source_dir configuration entry and point it at a directory where you saved the file from step #1.
>> 
>> 3. Restart Riak so it picks up the configuration change.
>> 
>> 4. Modify your job description to use the named function.
>> 
>> 5. If you need to edit the function or add/remove others you can use the riak-admin tool to reload the Javascript by issuing the command 'riak-admin reload_js'.
>> 
>> --Kevin
>> 
>> On Feb 23, 2010, at 10:05 AM, Victor 'Zverok' Shepelev wrote:
>> 
>>> Hi all.
>>> 
>>> Trying to test riak performance, I've stored 10'000 values
>>> (JSON-encoded objects) in one bucket, then trying map-reduce request
>>> to this bucket.
>>> 
>>> map phase is just "Riak.mapValuesJson"
>>> 
>>> reduce phase is like
>>> ---
>>>    function(values, a){
>>>        minKey = 'ZZZZ'; minTask = null;
>>>        for(i = 0; i < values.length; ++i){
>>>            val = values[i]
>>>            if(val.scheduled < minKey){
>>>                minKey = val.scheduled;
>>>                minTask = val;
>>>            }
>>>        }
>>>        return [minTask];
>>>    }
>>> ---
>>> 
>>> It's like just: find task with minimal "scheduled" field.
>>> 
>>> Then, on bucket with 10'000 values, I have this request performing
>>> ~1min (through Unix time) on Celeron 2.6GHz 1Gb.
>>> Is this result expected or am I doing something wrong?
>>> 
>>> Also, sometimes I obtain just {"error":"timeout"} instead of result.
>>> Is this expected?
>>> 
>>> Thanks.
>>> 
>>> V.
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 




More information about the riak-users mailing list