Feedback for GSoC project - RIak Destination for Syslog-ng

Gergely Nagy algernon at madhouse-project.org
Tue May 5 09:26:07 EDT 2015


>>>>> "Christopher" == Christopher Meiklejohn <cmeiklejohn at basho.com> writes:

    >> The intention is to store log messages in each element of the set:
    >> either as a string (syslog or json, or whatever else the user sees fit),
    >> or as a map of key-value pairs (where values themselves can be maps
    >> too).
    >> 
    >> On average, the log messages are a few kilobytes in size. There may be
    >> exceptions, but >1mb ones are fairly rare. How much data the set would
    >> hold... now that's a question that can't really be answered. It is
    >> really up to the syslog-ng user to configure that.

    Christopher> I’m referring to the size of the entire set, not the objects that will be members of 
    Christopher> the set. Therefore, the performance penalty seen when using large objects would 
    Christopher> be observed as soon as the size of the entire set (or map) has reached ~1 MB. 
    Christopher> Given that restriction, I’d imagine you would only be able to store a few messages
    Christopher> in each set.  That granularity seems like you are no longer getting the benefits 
    Christopher> of the set.

The granularity is configurable by the user: if they have small (say, a
few hundred byte long) messages, then we can store a reasonable amount
of them in a single Set. For example, assuming an average length of 384
bytes / message (the longest line in today's logs on my laptop), a Set
would be able to store about 2k messages. That's not too bad.

    Christopher> Additionally, the primary benefit of the data types in Riak is that they converge 
    Christopher> deterministically when dealing with concurrent
    Christopher> operations.

Not only that: using sets makes the keys predictable. If I want to
retrieve the logs, with sets, I can retrieve the
2015-05-01T15:12:10-T15:12:15 key for example, and have all the logs
from those 5 seconds. If I used one message per Riak object, it would be
much harder to read the data back.

There may be multiple threads adding to the same set, so the
deterministic convergence is useful still.

    Christopher> I’m curious if the set is the right choice here; could
    Christopher> you just use a custom set format inside of a normal
    Christopher> Riak object

That's an option worth considering, yes. Thanks!

    Christopher> (or store one message per Riak object, given the write will be an 
    Christopher> immutable log entry?)

That's the first goal of the project, because that's the easiest and
most straightforward to implement.

The downside of one message per Riak object is that it's hard to
retrieve the data, because making the keys predictable is not going to
be easy. Boxing a few hundred (or few thousand) together into a Set has
the advantage of making the keys predictable at the cost of transferring
more data to the client when looking for a subset of the logs within the
set.

-- 
|8]




More information about the riak-users mailing list