[lager] RFC on refactorings for a Graylog2 backend

Jason Wagner jason at nialscorva.net
Mon Nov 28 12:47:17 EST 2011

This is a very developer oriented question about Lager that I'm posting
here because I didn't find a dedicated list.

I'm working on a Graylog2 backend for Lager.  I'm sitting on a very basic
plugin that does little more than post log messages to a graylog2 system
that I'll push to github after I do some integration testing tonight.

However, there's a next step with the Graylog2 integration that would
really leverage the power of both tools-- pushing Lager's trace attributes
into Graylog2 user fields to really leverage the analytics that the newest
versions of Graylog2 provide.  This would allow both pre-log trace
filtering and after the fact trace analysis through Graylog2.

This would be a major change to the Lager internals and I wanted to solicit
opinions on the changes.

*Proposed changes
**Pass through all trace attributes*
The backends need to receive all trace attributes, including default
attributes such as line, file, function, pid, etc.  This would allow proper
population of the graylog2 fields.

I would change this by removing the lager_transform:transform_statement's
call of lager_util:check_traces so that all Traces got passed through.  I
don't believe this would impact any backends except via sending them more
payload in the event, but please correct me if I'm wrong.

*Refactor message formatting*
The formatting needs to be extracted and externalized from lager:log and
lager:log_dest.  This is probably desirable, anyway, since it would be a
very small step from this to have completely orthogonal format/sink
separation, allowing for a user compromise between speed and flexibility in
their log file formats.

I would create a behavior lager_message_formatter with one function that
takes the trace attributes, the message, and returns an iolist for output.
The current one would simply be:

format(Config,Trace,Format,Args) ->
     % etc for Module, Function, Line, Level
      [["[", atom_to_list(Level), "] "],
           io_lib:format("~p@~p:~p:~p ", [Pid, Module, Function, Line]),
           safe_format_chop(Format, Args, 4096)].

The backends would make this call rather than the lager:log or
lager:log_dest.  This would allow for a configuration parameter on the
backends to set which formatter they use, with a reasonable default, and
different formats for each backend configuration.

One drawback is that it significantly increases the size of the events,
especially if there is concern about run-away tuples being sent.  I'm not
sure how this would impact the performance in the long haul.

The formatting change also pushes some of the workload from the user
process to the logging process.  This can be an advantage if the workloads
aren't taxing the backend, but could penalize even simple logging
statements if the backend gets swamped and can't format/process fast
enough.  The potentially large messages have impact this as well.

There are a couple ways to mitigate this-- additional checks on the log
statement, separate format processes from write processes and pool the
formatters, and other ideas that I wouldn't propose until I actually see
the bottleneck.

*Request For Comments*
Are these changes desirable in the direction of lager?

I've seen performance based changes in the recent history.  Is performance
paramount over flexibility?  How do you currently measure performance?  Are
any scripts,scenarios,etc available?

Any historical lessons that I might be unaware in the vicinity of things
I'd be changing?

Jason Wagner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111128/70a1d3ea/attachment.html>

More information about the riak-users mailing list