luwak backend and misc.

Kunal Nawale knawale at verivue.com
Mon Aug 8 16:14:14 EDT 2011


Hi Bryan,
  I will try increasing the file size from 6 GB up and find out where it 
breaks. I will also capture the logs when it fails.

I have not played with the 'block_size' parameter, will try that too.
The upload client is running on a separate machine.
I am using curl to upload and download.
Thanks for your help.
-kunal



On 08/08/2011 03:51 PM, Bryan Fink wrote:
> On Thu, Jul 28, 2011 at 12:00 PM, Kunal Nawale<knawale at verivue.com>  wrote:
>> Hi,
>>   I am evaluating luwak to be used as a redundant file storage server. I am
>> trying to find out which backend will better suit for my purpose. Each of my
>> server has sixteen 1TB drives, 4 total servers, 48GB ram each, 1x10Gb
>> network interface
>> The file sizes that will be stored range from 1GB-20GB, with an average size
>> of 3 GB.
>>
>> Here are some observations/questions I had regarding this.
>>
>> 1) With bitcask backend, I tried uploading a 6 GB file. The upload and
>> download worked fine for this file. But when I tried to upload a 17 GB file
>> it took a very long time (more than 20 mins). Tried to download it but did
>> not succeed, the download always come back with a size of 1,000,000 bytes.
> There are a few things that can cause these troubles.  Have you
> checked the logs to see if there were any errors during any of these
> operations?
>
> On the upload side, it's possible that 6GB stands on one side of a
> boundary, and 17GB on the other.  I'd suggest searching the size space
> in a binary fashion: does 11.5GB work?  If there is a boundary, this
> is a good way to find it.  It might be worth trying this both in the
> case where the Riak cluster is cleaned out after each attempt, and
> where it is left running with all data for all attempts.  Does
> changing the "block_size" luwak parameter (controled by the
> X-Luwak-Block-Size HTTP header when you create the file) change where
> the boundary is?
>
> Still on the upload side, where was the upload client running?  That
> client may have been applying extra memory pressure to one of your
> nodes if it was on the same machine as the cluster.
>
> On the download side, if you haven't modified the block_size of your
> luwak files, 1,000,000 indicates that there's exactly one block in the
> file.  If this was an existing file of 1MB, then this just means that
> your 17GB upload failed before flushing the tree for the new data.
> We've also noticed that some clients (like Firefox) have trouble
> parsing Luwak's chunked response, due to an error in gzip encoding -
> try explicitly setting Accept-Encoding to only identity.
>
>> 2) I also tried fs_backend, but it turned out to be quite slow, the
>> upload of a 6 GB took considerably longer. The download never succeeded
>> it always returned me a chunk of that file not the whole file.
> The fs_backend was written as proof/testing code.  It is not optimized
> for any variety of speed.  Best to stick with Luwak for your use case,
> I think.
>
>> 3) Are there any performance measurements available about the read/write
>> bandwidths
> None directly, but you should be able to estimate the write speed:
> Luwak creates a Riak object for every N bytes of your file (N is known
> as the "block size").  Luwak will not be able to write these objects
> faster than any other Riak client.
>
>> 5) Can an object be read simultaneously while it is being written. With a
>> lag between the read write pointer being in the range of 60 MBytes.
> I questions 4&5 are related, and I think it's easier to answer 5 first.
>
> The simple answer is *new* Luwak files cannot be read while being
> written.  This has to do with the fact that the HTTP interface does
> not expose a way to flush the file's tree to Riak before finishing the
> upload.  The data for the file is being persisted, but the root
> pointer is not modified until the end.
>
> This also means that *existing* Luwak file *can* be read while being
> modified, but modifications made after the root pointer is found will
> be invisible.
>
>> 4) Are there any latency numbers available, I am specifically looking at the
>> time difference between the first byte read and the last byte write for an
>> object.
> I'm interpreting this question as, "After I finish writing, how long
> will it be before I can begin reading what I just wrote?" because of
> the tree-flushing behavior I described above.
>
> The answer depends on the backlog to the Luwak writer process (on the
> Riak/server side), and the depth of the resulting file tree.  Once the
> writer has written the final block to an object, it must then flush
> the tree pointing to that object.  Flushing the tree requires writing,
> at least, the root node and the "tld" object (where the metadata about
> the file is stored).  The block_size and tree_order parameters
> (1,000,000 bytes and 250, by default) determine how many other nodes
> must be written between the root and the block.  Each node is an
> additional Riak object write.
>
> I hope that helps,
> Bryan




More information about the riak-users mailing list