Feedback for GSoC project - RIak Destination for Syslog-ng

Gergely Nagy algernon at madhouse-project.org
Wed May 6 08:59:23 EDT 2015


Hi!

I'm the mentor for the Riak destination for syslog-ng project, please
allow me to answer the questions below:

>>>>> "Fred" == Fred Dushin <fdushin at basho.com> writes:

    Fred> As far as I understand, you're talking about a mapping from keys to
    Fred> sets, but I'm unclear on a few things.

The idea is to map a set of log messages to a Riak Set. Where both the
key used for the set, and its contents are configurable by the
user. There are no plans for a default at this time.

There are many ways to configure a syslog-ng=>Riak setup with a
destination like the one planned. One is to turn each log message (after
parsing) to a Riak Map, and push those maps into a Riak Set. Another way
is to format the parsed log messages (with all the extracted fiels, if
any) into JSON, and push those into a set.

So, for example, given the following syslog line:

May  6 14:42:18 eowyn avahi-daemon[27812]: Invalid response packet from host fe80::5d0f:d53a:7b6:3680.

We'd end up with a JSON like this:

{"timestamp": "2015-05-06T14:42:18+02:00",
 "host": "eowyn",
 "program": "avahi-daemon",
 "pid": 27812,
 "message": "Invalid response packet from host fe80::5d0f:d53a:7b6:3680.",
 "avahi-daemon": {
   "type": "warning",
   "message": "Invalid response packet",
   "host": "fe80::5d0f:d53a:7b6:3680"
 }
}

We could either add that to a Riak set as-is, or turn it into a Riak map
first.

    Fred> What are the keys you are thinking about? Time stamps? If
    Fred> timestamps, these are presumably the timestamps of the syslog
    Fred> event?

Whatever the user configures. They may be time stamps (rounded, for
predictable keys), or a combination of program name + current date (day
granuality).

    Fred> Just a word of warning, if so. You might find a lot of
    Fred> variation in timestamp formats and granularity. Perhaps you
    Fred> can get something reliable out of syslog-ng,

We get something sensible out of syslog-ng. But in the end, it is up to
the user to configure the template used for keys. There may - and
probably will - be examples, but no default.

    Fred> but that won't help you in the case where syslog-ng is
    Fred> functioning as a syslog relay, and you want to preserve the
    Fred> timestamp of the originator, which you should, if you want to
    Fred> preserve integrity of the logs (e.g, for compliance).

In case of syslog-ng, we actually have access to a few kinds of
timestamps: the timestamp from the log message (if any), the timestamp
of receipt, and the current time. The granularity of timestamps is
configurable to some extent.

    Fred> Or are you talking about a key being a (course grained)
    Fred> timestamp, say, an integral value in UTC seconds, for example?
    Fred> And the value(s) being all logs in that interval? Is that your
    Fred> motivation for sets?

That's one way, yes. One could also use something like
$PROGRAM/$YEAR-$MONTH-$DAY as key, if the program doesn't produce more
than a megabyte of logs a day. So with the example above, our key in
case of that log would be avahi-daemon/2015-05-06, and the message would
be an element of the set underneath the key.

    Fred> How much of the syslog payload are you planning to parse?

The destination itself is not going to do any parsing. Other parts of
syslog-ng do that, and it is up to the user to set up a pipeline that
feeds the destination. The source may be syslog, HTTP logs, the Journal,
or any of the other sources syslog-ng supports. How much parsing is
done, and what gets extracted, is no concern to the destination plugin.

    Fred> Another interesting problem is that the STRUCTURED-DATA element of
    Fred> 5424 uses OIDs to discriminate different data types that are encoded
    Fred> in the header. And while there is a kind of loosely coupled authority
    Fred> for OIDs, there is no infrastructure for determining a parsing
    Fred> strategy for these fields. They could really be anything, in the worst
    Fred> case.

As far as I remember, syslog-ng treats all STRUCTURED-DATA elements as
strings. But there are tools within syslog-ng to allow converting to
other data types, but that must be done explicitly.

    Fred> But regardless of the deeply structured data, you could get some very
    Fred> interesting traction by just taking standard headers and indexing them
    Fred> through Yokozuna. Certainly, indexing the body of a syslog message is
    Fred> a great idea, as these messages are generally unstructured and fodder
    Fred> for lucene. This is something that Logstash/ElasticSearch can do
    Fred> pretty effectively today, and it would be cool to see the same in Riak
    Fred> + some syslog provider.

Yep! When I proposed the idea, using Yokozuna is something I had in
mind. Combine the parsing abilities of syslog-ng, Riak for archival
purposes, and Yokozuna for searching. That sounds like a match made in heaven.

    Fred> Finally, it would be really nice if you could structure your plugin in
    Fred> such a way that they could eventually be ported to rsyslog [2]. The
    Fred> rsyslogd daemon is deployed by default on certain Linux favors and
    Fred> enjoys fairly widespread distribution. You might be able to get it
    Fred> supported in that community, as well.

Part of the project is writing a small library to send data to Riak,
From C. Just enough for syslog-ng's needs. That library could be used by
rsyslog, too (like the MongoDB library originally written for
syslog-ng's purposes is used by rsyslog too). But sharing more code than
that is not practical, the two daemons work in widely different ways.

-- 
|8]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20150506/1260f670/attachment.asc>


More information about the riak-users mailing list