Feedback for GSoC project - RIak Destination for Syslog-ng

Christopher Meiklejohn cmeiklejohn at basho.com
Tue May 5 08:11:57 EDT 2015


> On May 5, 2015, at 1:01 PM, Gergely Nagy <algernon at madhouse-project.org> wrote:
> 
>>>>>> "Christopher" == Christopher Meiklejohn <cmeiklejohn at basho.com> writes:
> 
>    Christopher> I’m a bit concerned with your use of the set embedded in the
>    Christopher> map.
> 
> The original idea was to use a Set directly. The Set-in-Map thing was
> just a thought experiment (Map-in-Set would make more sense).
> 
>    Christopher> Large objects have traditionally been a big problem in Riak due
>    Christopher> to the use of distributed Erlang and head of line blocking. I’m
>    Christopher> curious if you could elaborate on what type of data you will be
>    Christopher> storing in the set: how big you expect each item to be, how big you
>    Christopher> expect the map to be, and the overall layout of data inside of the
>    Christopher> data structure.
> 
> The intention is to store log messages in each element of the set:
> either as a string (syslog or json, or whatever else the user sees fit),
> or as a map of key-value pairs (where values themselves can be maps
> too).
> 
> On average, the log messages are a few kilobytes in size. There may be
> exceptions, but >1mb ones are fairly rare. How much data the set would
> hold... now that's a question that can't really be answered. It is
> really up to the syslog-ng user to configure that.

I’m referring to the size of the entire set, not the objects that will be members of 
the set. Therefore, the performance penalty seen when using large objects would 
be observed as soon as the size of the entire set (or map) has reached ~1 MB. 
Given that restriction, I’d imagine you would only be able to store a few messages
in each set.  That granularity seems like you are no longer getting the benefits 
of the set.

Additionally, the primary benefit of the data types in Riak is that they converge 
deterministically when dealing with concurrent operations.  I’m curious if the set
is the right choice here; could you just use a custom set format inside of a normal
Riak object (or store one message per Riak object, given the write will be an 
immutable log entry?)

Thanks,
- Chris

Christopher Meiklejohn
Senior Software Engineer
Basho Technologies, Inc.
cmeiklejohn at basho.com





More information about the riak-users mailing list