Multiple disks

Alexander Sicular siculars at gmail.com
Tue Mar 22 10:01:50 EDT 2011


This kinda thing is an operational nightmare. At the very least, I  
imagine, you are going to have to have symlinks on all your nodes for  
all your vnode/directories combos. How does this get managed in  
failure scenarios or when adding/removing nodes?

Think about it a bit if you were to do this I think the most reliable  
way to get the info out to the cluster would be to use riak_core to  
gossip around vnode meta data that included the path. Still that path  
would have to exist on all your nodes or be created dynamically when a  
vnode gets assigned.

Again, imho, this is an "incredibly bad idea" (tm) that will introduce  
all sorts of new pain into your life in ways you dont want to think  
about. It is really not worth the incredible headache vs the savings  
in disk space you would get by mapping vnodes to disks. Save your ops  
dudes the headache and just use raid 5 and be done with it. You may  
just keep your ops from killing themselves - and hey, the life you  
save may be your own.

(not to say your patch isn't cool though, Joseph)

-Alexander


@siculars on twitter
http://siculars.posterous.com

Sent from my iPhone

On Mar 22, 2011, at 0:47, Joseph Blomstedt  
<Joseph.Blomstedt at gmail.com> wrote:

> Oh, I just realized that was only a partial solution to the problem. I
> forgot to commit related logic that handles selecting the same
> directory on vnode restart. That's what I get for sending out code
> late at night. You'll want to maintain a partition->directory index
> somewhere to really make it work (or search all directories for an
> existing bitcask corresponding to the partition).
>
> For what it's worth, my experiments a few months back in this area
> just used a deterministic function to map partitions to a directory.
> That's another approach.
>
> -Joe
>
> On Tue, Mar 22, 2011 at 1:25 AM, Joseph Blomstedt
> <Joseph.Blomstedt at gmail.com> wrote:
>> Each vnode already opens a separate bitcask, therefore there isn't  
>> any
>> necessary factor preventing the desired behavior. It's just not coded
>> that way. While an individual bitcask must be a single directory,
>> there is no reason all vnodes need to open bitcasks within a shared
>> root directory.
>>
>> Luckily, it's easy to change this behavior. In fact, I played around
>> with the idea awhile back. This question prompted me to find/release
>> the code:
>> https://github.com/jtuple/riak_kv/commit/a8ab33224651e6850aed385e4c05c1993916a3e5
>>
>> That commit should apply against riak-0.14.1. It extends the bitcask
>> data_root config option to allow for multiple root paths as well as a
>> selection strategy (random or spread). Random just randomly chooses
>> one of the directories. Spread picks the directory containing the
>> fewest already-opened bitcasks -- although, this is a soft guarantee
>> since no effort is taken to address multiple vnodes choosing a
>> directory concurrently.
>>
>> Using paths that correspond to different mounted drives should do  
>> the trick.
>>
>> -Joe
>>
>>
>> On Mon, Mar 21, 2011 at 5:29 PM, Greg Nelson <grourk at dropcam.com>  
>> wrote:
>>> Hello,
>>> We are currently evaluating Riak for an application that will  
>>> store large
>>> amounts of data in a write-heavy pattern.  We'd like to pack many  
>>> disks into
>>> each machine.  Currently, it appears that Bitcask uses exactly one  
>>> directory
>>> to store data.  What is the best way to have it use multiple  
>>> disks?  Is this
>>> something Innostore would handle better?
>>> We'd like to avoid RAID since we'll be paying for redundancy at a  
>>> higher
>>> level with Riak (N=3, etc.).
>>> We'd also like to avoid a JBOD type setup where a single disk  
>>> failure brings
>>> the whole node down, as we'll obviously be increasing those odds  
>>> with each
>>> disk.
>>> What I'm wondering is, can each node distribute its vnodes across  
>>> many
>>> disks?  And if one of those disks fails, will Riak handle that  
>>> appropriately
>>> (i.e., the other vnodes continue to operate normally and hand-off  
>>> data when
>>> the new disk comes online)?
>>> Thanks!
>>> Greg
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list