Durable writes and parallel reads

Jon Meredith jmeredith at basho.com
Thu Nov 3 19:21:29 EDT 2011

Hi Erik,

Apologies it's taken so long to get back to you.

Durable writes:
  You're interpreting it correctly.  DW means that the storage backend has
accepted the write.  As the backends are pluggable and configurable so that
affects what durable means.  For bitcask you can control the sync strategy
and there are similar controls for innodb.  For the memory backend there is
no durable write.  With hindsight it would have been better to have used
something like accepted write (AW) and write (W) rather than W/DW, but
we're fairly stuck with it now.  Combining writes/acceptance is a very
interesting idea going forward, but doesn't fit well with the sync nature
of the backend API we have currently.

Parallel reads:
  With 1.0.0 we've introduced a thread pool to increase the concurrency
vnodes can use for listing keys.  I'd like to improve on read concurrency.
 The current architecture ensures that a key is only updated by a single
thread which makes writing backend drivers simpler.  We either need to add
support to the k/v vnode to ensure the property is true when being updated
in parallel or change the backend drivers to be tolerant of it.

The performance numbers are interesting.  How many vnodes were you


On Mon, Oct 31, 2011 at 10:38 AM, Erik Søe Sørensen <ess at trifork.com> wrote:

> The following is a couple of questions (and suggestions) regarding the
> technical sides of Riak performance and reliability.
> The questions have been prompted by reading Riak source code and by
> discussions within our company.
> I suppose the common thread here is "latency hiding"...
> Durable Writes.
> ---------------
> The default for bitcask's 'sync_strategy' setting is not to flush to disk
> explicitly at all.
> This means that a 'Durable Write' isn't actually durable; the difference
> between 'W' and 'DW' replies is whether the write has made it past Riak, to
> the OS - but not through the OS and down to disk.
> Is this correct?
> What I'd have expected, as a reaonably-performing alternative, is that
> Riak would flush periodically - say, after at most W_flush writes or
> MS_flush milliseconds, and send 'dw' replies for all of the relevant
> requests (those written since last flush) at once after the flush has been
> completed.
> This would combine 'real' DW semantics with reasonable performance (and is
> how I have handled a similar problem; my conceptions about what is right
> and proper may of course be influenced by my own coding history...).
> (For kicks, MS_flush might even be dynamically determined by how long the
> flush operations tend to take; the typical value of a flush duration,
> multiplied by a small constant, would probably be a fitting value.)
> Parallel Reads.
> ---------------
> Within a vnode, bitcask read operations happen in serial.
> Is there any reason for reads not happening in parallel?
> For map/reduce operations, in particular, I imagine this might make a
> difference, by giving the OS the opportunity to schedule disk accesses so
> as to reduce seek time.
> (Unless of course Riak itself reorders the keys before reading, but I
> don't believe this is the case - especially since the order would depend on
> the backend: for bitcask, by time; for innostore, by key order, for
> instance.)
> Of course, if each host has multiple vnodes, there will be some
> parallellity even with serialized  reads within each bitcask.
> Regards,
> Erik Søe Sørensen
> ______________________________**_________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>

Jon Meredith
Platform Engineering Manager
Basho Technologies, Inc.
jmeredith at basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111103/4192e0b6/attachment.html>

More information about the riak-users mailing list