[lager] RFC on refactorings for a Graylog2 backend

Andrew Thompson andrew at hijacked.us
Mon Nov 28 13:34:59 EST 2011

On Mon, Nov 28, 2011 at 12:47:17PM -0500, Jason Wagner wrote:
> This is a very developer oriented question about Lager that I'm posting
> here because I didn't find a dedicated list.

Yes, it doesn't have a list. I'm not sure if it is worth rectifying that
yet. Here should be fine for now.

> I'm working on a Graylog2 backend for Lager.  I'm sitting on a very basic
> plugin that does little more than post log messages to a graylog2 system
> that I'll push to github after I do some integration testing tonight.
> However, there's a next step with the Graylog2 integration that would
> really leverage the power of both tools-- pushing Lager's trace attributes
> into Graylog2 user fields to really leverage the analytics that the newest
> versions of Graylog2 provide.  This would allow both pre-log trace
> filtering and after the fact trace analysis through Graylog2.
> This would be a major change to the Lager internals and I wanted to solicit
> opinions on the changes.
> *Proposed changes
> **Pass through all trace attributes*
> The backends need to receive all trace attributes, including default
> attributes such as line, file, function, pid, etc.  This would allow proper
> population of the graylog2 fields.
> I would change this by removing the lager_transform:transform_statement's
> call of lager_util:check_traces so that all Traces got passed through.  I
> don't believe this would impact any backends except via sending them more
> payload in the event, but please correct me if I'm wrong.

I've actually been considering doing this as well, for similar reasons.
I don't see a problem with doing it.

> *Refactor message formatting*
> The formatting needs to be extracted and externalized from lager:log and
> lager:log_dest.  This is probably desirable, anyway, since it would be a
> very small step from this to have completely orthogonal format/sink
> separation, allowing for a user compromise between speed and flexibility in
> their log file formats.
> I would create a behavior lager_message_formatter with one function that
> takes the trace attributes, the message, and returns an iolist for output.
> The current one would simply be:
> format(Config,Trace,Format,Args) ->
>       Pid=proplists:get_value(pid,Trace),
>      % etc for Module, Function, Line, Level
>       [["[", atom_to_list(Level), "] "],
>            io_lib:format("~p@~p:~p:~p ", [Pid, Module, Function, Line]),
>            safe_format_chop(Format, Args, 4096)].
> The backends would make this call rather than the lager:log or
> lager:log_dest.  This would allow for a configuration parameter on the
> backends to set which formatter they use, with a reasonable default, and
> different formats for each backend configuration.

I think that might be overkill, but I suppose it's acceptable.

> *Concerns*
> One drawback is that it significantly increases the size of the events,
> especially if there is concern about run-away tuples being sent.  I'm not
> sure how this would impact the performance in the long haul.
> The formatting change also pushes some of the workload from the user
> process to the logging process.  This can be an advantage if the workloads
> aren't taxing the backend, but could penalize even simple logging
> statements if the backend gets swamped and can't format/process fast
> enough.  The potentially large messages have impact this as well.

Technically, this won't happen because lager uses sync_notify, which
means the log call blocks until all the backends have handled it.
However, this *does* mean that long log formatting calls in the
gen_event will probably cause logging calls to block *more*.

What about doing the format string formatting in the calling process,
*once*, and then just sending along the rest of the data (timestamp,
callsite information, trace attributes, etc) with the output from doing
the string formatting? That should be the meat of the formatting anyway,
and I'd like to make sure we only do it once (we actually do it twice at
the moment, if a trace matches).

> There are a couple ways to mitigate this-- additional checks on the log
> statement, separate format processes from write processes and pool the
> formatters, and other ideas that I wouldn't propose until I actually see
> the bottleneck.
> *Request For Comments*
> Are these changes desirable in the direction of lager?
> I've seen performance based changes in the recent history.  Is performance
> paramount over flexibility?  How do you currently measure performance?  Are
> any scripts,scenarios,etc available?
> Any historical lessons that I might be unaware in the vicinity of things
> I'd be changing?

I think overall this is the path Lager is going to have to tread.

I'd rather balance Lager in the direction of flexibility and power over
performance, but egregious bottlenecks shouldn't be tolerated if there's
something that can be done to fix them. Lager is really about *safely*
logging messages, making them easy to read and being operationally
powerful. It already lags the competitors in performance, for the most
part, because of things it does to further these goals.

I have some simple performance testing code here:


But it's fairly unscientific right now, and needs some rewriting. I've
also got a half-finished benchmark blogpost comparing lager to the
alternatives, which also needs some rewriting.

So the TLDR is; I think you're on the right path, go ahead and make the
changes and I'll certainly review them for inclusion. If you don't I'll
probably end up doing something similar anyway, but it'll take longer
(I have other projects at basho to complete ATM).


More information about the riak-users mailing list