Reasons for using gen_server to gather statictics with folsom
Sergey_Zhemzhitsky at troika.ru
Mon Aug 13 05:22:30 EDT 2012
Thanks for extremely helpful answer!
Now the taken decisions are much clearer.
From: Russell Brown [mailto:russelldb at basho.com]
Sent: Monday, August 13, 2012 12:27 PM
To: Zhemzhitsky Sergey
Cc: riak-users at lists.basho.com
Subject: Re: Reasons for using gen_server to gather statictics with folsom
First, sorry for missing your first post. I just didn't see it.
I'll try and answer your questions.
> 1. Why separate gen_servers (riak_api_stat, riak_core_stat, riak_kv_stat) were used to gather statistics instead of the direct calls to folsom_metrics through some more high-level api?
Time. There will be a high-level api provided by riak-core, or folsom, soon. The idea being that you declaratively register stats and riak-core will start a/some processes for you and you just use the API to update stats. I started work on this structure but didn't finish it in time for 1.2. It is what I am working on next. I'll keep you posted. If you follow the existing model hopefully porting to the new API will be relatively simple. Sorry for not getting it solidified sooner. The reason for gen_servers, of course, is to cast the calls to folsom rather than blocking on ets when doing critical Riak ops like writing and reading data. There are a number of table ownership/crashing issues in folsom, as well as a couple of race conditions. I'll be working with Joe Williams of Boundary to resolve these and refactor folsom as part of my ongoing stats work for Riak. Watch that repo to keep informed.
> 2. What is the purpose of riak_core_stat_cache and what it is intended to do?
Calculating the histograms for stats is expensive. Especially when there are a lot of readings. In some cases it can take a few seconds to calculate stats for some metrics on a busy node. The cache is there for 2 reasons. 1. To only have one process calculating stats at a time, so if multiple calls to get stats happen at once, one process actually calculates and the rest are parked and notified when the answer comes. 2. To actually cache the results so they're not calculated more often than needed. There are stats gathered on how long it takes to calculate stats, and the idea was to have the mean time to calculate stats for an application to be the cache TTL. That is work still to be done.
But in many ways the cache is there to support backwards compatibility for Riak's /stats endpoint and the riak-admin commands. In future I'd rather expose the folsom stats directly over REST and CLI so you can request only the stat you want and not waste time calculating a load of stats you're not interested in. This is the next, next thing I'll be working on.
> As far as I understand riak_core_stat_cache caches stats using ets, so I’m wondering why statistics that is stored in ets is cached using ets?
So why cache stats in ets that are already in ets: the cache is for groups of stats that have had the _expensive_ calculations run on them already, folsom stores the raw readings in ets.
> Is it correct that calls to folsom_metrics are done via gen_server to decrease the possibility of losing ets tables that are bound to a concrete process?
Really calls are done via gen_server so that calls to folsom are cast. Originally the code called folsom direct in process but bench marking showed this to be slower and more damaging in the case of an error/crash in folsom. I mention the ets ownership/crashing issues above. There is an example of one here. I'm going to work on refactoring folsom to have a more coherent strategy of table ownership.
I hope this helps, if I've missed anything please ask. The short term aim was to stabilise stats in Riak and fix known issues, and I think I accomplished that. Next is to better structure the code so that riak-core provides a stats service.
On 13 Aug 2012, at 09:54, Zhemzhitsky Sergey wrote:
Any updates on these questions?
I’ve read the following blog entry http://basho.com/blog/technical/2012/07/02/folsom-backed-stats-riak-1-2/and still haven’t found the answers.
As far as I understand riak_core_stat_cache caches stats using ets, so I’m wondering why statistics that is stored in ets is cached using ets?
Is it correct that calls to folsom_metrics are done via gen_server to decrease the possibility of losing ets tables that are bound to a concrete process?
From: riak-users-bounces at lists.basho.com<mailto:riak-users-bounces at lists.basho.com> [mailto:riak-users-bounces at lists.basho.com] On Behalf Of Zhemzhitsky Sergey
Sent: Friday, August 10, 2012 6:33 PM
To: riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
Subject: Reasons for using gen_server to gather statictics with folsom
Hi riak gurus,
Recently riak 1.2 has been released that uses folsom library to gather statistics.
I’d like to use the same library (folsom) in my application so could you answer the following questions:
1. Why separate gen_servers (riak_api_stat, riak_core_stat, riak_kv_stat) were used to gather statistics instead of the direct calls to folsom_metrics through some more high-level api?
2. What is the purpose of riak_core_stat_cache and what it is intended to do?
The information contained in this message may be privileged and conf idential and protected from disclosure. If you are not the original intended recipient, you are hereby notified that any review, retransmission, dissemination, or other use of, or taking of any action in reliance upon, this information is prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and delete it from your computer. Thank you for your cooperation. Troika Dialog, Russia.
If you need assistance please contact our Contact Center (+7495) 258 0500 or go to www.troika.ru/eng/Contacts/system.wbp<http://www.troika.ru/eng/Contacts/system.wbp>
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users