leveldb Hot Threads in 1.4.9?

Matthew Von-Maszewski matthewv at basho.com
Sat Jul 5 13:34:51 EDT 2014


Tom,

Basho prides itself on quickly responding to all user queries.  I have failed that tradition in this case.  Please accept my apologies.


The LOG data suggests leveldb is not stalling, especially not for 4 hours.  Therefore the problem is related to disk utilization.

You appear to have large values.  I see .sst files where the average value is 100K to 1Mbyte in size.  Is this intentional, or might you have a sibling problem?


My assessment is that your lower levels are full and therefore cascading regularly.  "cascading" is like the typical champagne glass pyramid you see at weddings.  Once all the glasses are full, new champagne at the top causes each subsequent layer to overflow into the one below that.  You have the same problem, but with data.  

Your large values have filled each of the lower levels and regularly cause cascading data between multiple levels.  The cascading is causing each 100K value write to become the equivalent of a 300K or 500K value as levels overflow.  This cascading is chewing up your hard disk performance (by reducing the amount of time the hard drive has available for read requests).

The leveldb code for Riak 2.0 has increased the size of all the levels.  The table of sizes is found at the top of leveldb's db/version_set.cc.  You could patch your current code if desired with this table from 2.0:

{                                                                                                                                 
    {10485760,  262144000,  57671680,      209715200,                 0,     420000000, true},                                   
    {10485760,   82914560,  57671680,      419430400,                 0,     209715200, true},                                   
    {10485760,  314572800,  57671680,     3082813440,         200000000,     314572800, false},                                   
    {10485760,  419430400,  57671680,     6442450944ULL,     4294967296ULL,  419430400, false},                                   
    {10485760,  524288000,  57671680,   128849018880ULL,    85899345920ULL,  524288000, false},                                   
    {10485760,  629145600,  57671680,  2576980377600ULL,  1717986918400ULL,  629145600, false},                                   
    {10485760,  734003200,  57671680, 51539607552000ULL, 34359738368000ULL,  734003200, false}                                   
};                                                                                                                

You cannot take the entire 2.0 leveldb into your 1.4 code base due to various option changes.


Let me know if this helps.  I have previously hypothesized that "grooming" compactions should be limited to one thread total.  However my test datasets never demonstrated a benefit.  Your dataset might be the case that proves the benefit.  I will go find the grooming patch to hot_threads for you if the above table proves insufficient.

Matthew




On Jul 2, 2014, at 9:20 PM, Tom Lanyon <tom+riak at oneshoeco.com> wrote:

> Hi Matthew, 
> 
> Just thought I'd see whether you were back from your travels and had had a chance to take a look at the log file provided?
> 
> There's no rush if you haven't had a chance!
> 
> Regards,
> Tom
> 
> 
> On Tuesday, 24 June 2014 at 10:45, Tom Lanyon wrote:
> 
>> No problem, Matthew. 
>> 
>> Appreciate you taking a look when you have time.
>> 
>> Regards,
>> Tom
>> 
>> 
>> On Tuesday, 24 June 2014 at 9:45, Matthew Von-Maszewski wrote:
>> 
>>> Tom,
>>> 
>>> I have been distracted today and on a plane tomorrow. I apologize for the delayed response. It may be late tomorrow before I can share further thoughts. 
>>> 
>>> Again my apologies.
>>> 
>>> Matthew Von-Maszewski
>>> 
>>> 
>>> On Jun 23, 2014, at 8:58, Tom Lanyon <tom+riak at oneshoeco.com (mailto:tom+riak at oneshoeco.com)> wrote:
>>> 
>>>> Thanks; the combined_log for our Riak node 3 is here:
>>>> 
>>>> https://www.dropbox.com/s/krhhwnplpeyhl0c/riak3-combined_log-20140623.log.gz
>>>> 
>>>> Let me know if you can't retrieve/view it.
>>>> 
>>>> With timestamps relative to this log file, at 2014/06/23-05:35 our monitoring detected node3's Riak as "down"; it wasn't serving any client protobuf requests, "riak ping" didn't respond and all of the other nodes marked node 3 as unreachable. We watched the process and it was busy doing leveldb compactions so we left it alone and it eventually recovered at 2014/06/23-09:32 (so ~4 hours unresponsive).
>>>> 
>>>> Yes - this cluster started at 1.2.1 and then I believe it went to 1.3.1, 1.4.2 and now 1.4.8. However, we went from 1.3.1-->1.4.2 in September 2013 and 1.4.2-->1.4.8 in May, so we've been running 1.4.x for many months - does this fit with the 'one time cost of upgrading' you mentioned?
>>>> 
>>>> Regards,
>>>> Tom
>>>> 
>>>> 
>>>> On Monday, 23 June 2014 at 19:29, Matthew Von-Maszewski wrote:
>>>> 
>>>>> Yes, off list is fine for the data files. I may or may not respond via the list depending upon what I find.
>>>>> 
>>>>> I did recall a case where leveldb seems unresponsive for hours. This case was a one time cost of upgrading some 1.2 or 1.3 systems to 1.4. Would that happen to describe your scenario?
>>>>> 
>>>>> Matthew Von-Maszewski
>>>>> 
>>>>> 
>>>>> On Jun 23, 2014, at 0:28, Tom Lanyon <tom+riak at oneshoeco.com (mailto:tom+riak at oneshoeco.com)> wrote:
>>>>> 
>>>>>> Hi Matthew, 
>>>>>> 
>>>>>> Thanks for the response and apologies for my off-list reply.
>>>>>> 
>>>>>> I can send a combined_log example directly to you if that helps? It's 13MB gzip'ed.
>>>>>> 
>>>>>> Regards,
>>>>>> Tom
>>>>>> 
>>>>>> 
>>>>>> On Monday, 23 June 2014 at 12:30, Matthew Von-Maszewski wrote:
>>>>>> 
>>>>>>> Hot threads is included with 1.4.9. The leveldb source file leveldb/util//hot_threads.cc (http://hot_threads.cc (http://_threads.cc) (http://_threads.cc) (http://_threads.cc)) is the key file.
>>>>>>> 
>>>>>>> The code helps throughput, but is not magical. "unresponsive for hours" is not a known problem in the 1.4.x code base. Would you mind posting an aggregate LOG file from a period when this happens?
>>>>>>> 
>>>>>>> sort /var/lib/riak/*/LOG >combined_log
>>>>>>> 
>>>>>>> Substitute your actual data path for /var/lib/riak.
>>>>>>> 
>>>>>>> Matthew Von-Maszewski
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 22, 2014, at 22:07, Tom Lanyon <tom+riak at oneshoeco.com (mailto:tom+riak at oneshoeco.com)> wrote:
>>>>>>> 
>>>>>>>> Could someone please confirm whether 1.4.9 includes "Hot Threads" in leveldb? 
>>>>>>>> 
>>>>>>>> The release notes have a link to it, but I couldn't find my way through the rebar & git maze to be absolutely sure it is in 1.4.9 but not 1.4.8.
>>>>>>>> 
>>>>>>>> We're seeing nodes unresponsive for hours during large compactions and wondered if this leveldb improvement would help.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Tom
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> riak-users mailing list
>>>>>>>> riak-users at lists.basho.com (mailto:riak-users at lists.basho.com)
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140705/e27d0314/attachment.html>


More information about the riak-users mailing list