luwak backend and misc.

Bryan Fink bryan at basho.com
Mon Aug 8 15:51:59 EDT 2011


On Thu, Jul 28, 2011 at 12:00 PM, Kunal Nawale <knawale at verivue.com> wrote:
> Hi,
>  I am evaluating luwak to be used as a redundant file storage server. I am
> trying to find out which backend will better suit for my purpose. Each of my
> server has sixteen 1TB drives, 4 total servers, 48GB ram each, 1x10Gb
> network interface
> The file sizes that will be stored range from 1GB-20GB, with an average size
> of 3 GB.
>
> Here are some observations/questions I had regarding this.
>
> 1) With bitcask backend, I tried uploading a 6 GB file. The upload and
> download worked fine for this file. But when I tried to upload a 17 GB file
> it took a very long time (more than 20 mins). Tried to download it but did
> not succeed, the download always come back with a size of 1,000,000 bytes.

There are a few things that can cause these troubles.  Have you
checked the logs to see if there were any errors during any of these
operations?

On the upload side, it's possible that 6GB stands on one side of a
boundary, and 17GB on the other.  I'd suggest searching the size space
in a binary fashion: does 11.5GB work?  If there is a boundary, this
is a good way to find it.  It might be worth trying this both in the
case where the Riak cluster is cleaned out after each attempt, and
where it is left running with all data for all attempts.  Does
changing the "block_size" luwak parameter (controled by the
X-Luwak-Block-Size HTTP header when you create the file) change where
the boundary is?

Still on the upload side, where was the upload client running?  That
client may have been applying extra memory pressure to one of your
nodes if it was on the same machine as the cluster.

On the download side, if you haven't modified the block_size of your
luwak files, 1,000,000 indicates that there's exactly one block in the
file.  If this was an existing file of 1MB, then this just means that
your 17GB upload failed before flushing the tree for the new data.
We've also noticed that some clients (like Firefox) have trouble
parsing Luwak's chunked response, due to an error in gzip encoding -
try explicitly setting Accept-Encoding to only identity.

> 2) I also tried fs_backend, but it turned out to be quite slow, the
> upload of a 6 GB took considerably longer. The download never succeeded
> it always returned me a chunk of that file not the whole file.

The fs_backend was written as proof/testing code.  It is not optimized
for any variety of speed.  Best to stick with Luwak for your use case,
I think.

> 3) Are there any performance measurements available about the read/write
> bandwidths

None directly, but you should be able to estimate the write speed:
Luwak creates a Riak object for every N bytes of your file (N is known
as the "block size").  Luwak will not be able to write these objects
faster than any other Riak client.

> 5) Can an object be read simultaneously while it is being written. With a
> lag between the read write pointer being in the range of 60 MBytes.

I questions 4&5 are related, and I think it's easier to answer 5 first.

The simple answer is *new* Luwak files cannot be read while being
written.  This has to do with the fact that the HTTP interface does
not expose a way to flush the file's tree to Riak before finishing the
upload.  The data for the file is being persisted, but the root
pointer is not modified until the end.

This also means that *existing* Luwak file *can* be read while being
modified, but modifications made after the root pointer is found will
be invisible.

> 4) Are there any latency numbers available, I am specifically looking at the
> time difference between the first byte read and the last byte write for an
> object.

I'm interpreting this question as, "After I finish writing, how long
will it be before I can begin reading what I just wrote?" because of
the tree-flushing behavior I described above.

The answer depends on the backlog to the Luwak writer process (on the
Riak/server side), and the depth of the resulting file tree.  Once the
writer has written the final block to an object, it must then flush
the tree pointing to that object.  Flushing the tree requires writing,
at least, the root node and the "tld" object (where the metadata about
the file is stored).  The block_size and tree_order parameters
(1,000,000 bytes and 250, by default) determine how many other nodes
must be written between the root and the block.  Each node is an
additional Riak object write.

I hope that helps,
Bryan




More information about the riak-users mailing list