Riak on SAN

Pedram Nimreezi mc at majorcomputing.com
Wed Oct 2 23:06:22 EDT 2013


I think this is just an area that hasn't been addressed unlike things like
a "hot" copy in the rdbms world..
I think at least a plan for how a hotcopy tool for a riak core semantics
database system
could be made would be a good step in the right direction, ie.. maybe a
combination of
specifying a target system to transfer to, then converge the source ring ie
5 nodes to 1..
having a riak_backup_vnode with a batch put fsm, that transfers 1-10mb's
worth of keys to
another riak system's vnode compressed with crc checks and writes directly
to leveldb.
Or a wiki of some sort outlining possible strategies for achieving this.

The current backup approach is not at all the way it could or should be.



On Wed, Oct 2, 2013 at 10:24 PM, John E. Vincent <
lusis.org+riak-users at gmail.com> wrote:

> Man I go away for a few hours for family time and off things go ;)
>
> So this lead to some interesting convos on twitter and here. Others have
> addressed some things. I figure it helps to explain the sad lonely world I
> live in - It's called "Enterprise".
>
> A few folks are somewhat aware that our product uses Riak under the
> covers. We have a hosted SaaS version and we also allow customers to
> install it entirely isolated on their own networks. The only people who do
> this are traditional enterprises.
>
> The very first question that comes up during an installation after "you
> need HOW many servers?" is "How do I back this up".  Since we use LevelDB,
> we have the worst of backup options - coordinated node shutdown and
> tarball'ing. The thing is we can't say to them "you never need to back this
> up. Just add more nodes!"
>
> That doesn't check the box they have. That doesn't meet the legal and
> industry guidelines they have to follow. So now that they've swallowed the
> "you need 5 servers just for the DB" we now hit them with "your backup
> strategy involves this complicated orchestrated shutdown process and some
> tarballs". When faced with that, we ran into a new issue. They started
> doing vm snapshots and stupid shit like vmotioning the instances (oh yes,
> they virtualize it =/).
>
> If you aren't familiar with vmotion, it's basically vmware's bullshit that
> says they can somehow defy the laws of physics. If you read the details on
> exactly what vmotion does (hint - it doesn't actually take the node offline
> - vmware just "buffers" the pending network requests among other things),
> you can see how this can TOTALLY fuck up Riak clusters.
>
> Anyway so this is the world we have to live in and we have to provide
> something that resembles a backup they can DR from. Our normal course of
> action is to tell them to contact Basho for RiakDS and go multi-site. SAN
> based snapshots largely meet that need for them.
>
> For what it's worth, this is not just a problem with Riak and there are
> legitimate use cases for wanting to have a "copy" of production data for
> testing new code against. The biggest problem is once you get data IN to
> riak (and other stores), it's REALLY difficult to prune it outside of
> expensive "walk all the things", an external index of some kind or
> resorting to application-level business logic tooling.
>
> I'm not making a judgement call. Trade offs are a thing but it's
> definitely a issue. At this point I'm considering resorting to a
> post-commit hook machination of some kind.
>
>
> On Wed, Oct 2, 2013 at 6:02 PM, Jeremiah Peschka <
> jeremiah.peschka at gmail.com> wrote:
>
>> Responses inline.
>>
>> TL;DR - I actually agree with John, SANs make management of storage
>> stupidly easy, but you pay more money for it. Make the right decision for
>> your org, but make sure you can monitor and backup that decision. The SAN
>> isn't a magic box. And  a Drobo b1200i [2] is definitely not a SAN.
>>
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>> MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>>
>>
>> On Wed, Oct 2, 2013 at 2:12 PM, John E. Vincent <
>> lusis.org+riak-users at gmail.com> wrote:
>>
>>> I'm going to take a competing view here.
>>>
>>> SAN is a bit overloaded of a term at this point. Nothing precludes a SAN
>>> from being performant or having SSDs. Yes the cost is overkill for fiber
>>> but iSCSI is much more realistic. Alternately you can even do ATAoE.
>>>
>>
>> Agreed. You can buy a glorified direct attached storage device with a few
>> ethernet ports in it, but vendors will call it a SAN.
>>
>>
>>>
>>> From a hardware perspective, if I have 5 pizza boxes as riak nodes, I
>>> can only fit so many disks in them. Meanwhile I can add another shelf to my
>>> SAN and expand as needed.
>>>
>>
>> We have the ability to cram 16x 960GB SSDs into the front of a Dell R720
>> for about $550 per drive... no SAN vendor can beat you on price for that.
>> SAN storage is an order of magnitude more expensive, but...
>>
>>
>>> Additionally backup of a SAN is MUCH easier than backup of a riak node
>>> itself. It's a snapshot and you're done. Mind you nothing precludes you
>>> from doing LVM snapshots in the OS but you still need to get the data OFF
>>> that system for it to be truly backed up.
>>>
>>
>> The products worth of being called a SAN offer you fantastic features
>> like application aware volume snapshots, multi-site async and synchronous
>> block level synchronization, and all kinds of amazing features that mean
>> you never need to think about your storage beyond "HEY THERE, MAGIC BOX, I
>> NEED 500GB OF SPACE!"
>>
>>
>>>
>>> I love riak and other distributed stores but backing them up is NOT a
>>> solved problem. Walking all keys, coordinating the take down of all your
>>> nodes in a given order or whatever your strategy is a serious pain point.
>>>
>>> Using a SAN or local disk also doesn't excuse you from watching I/O
>>> performance. With a SAN I get multiple redundant paths to a block device
>>> and I don't get that necessarily with local storage.
>>>
>>> Just my two bits.
>>>
>>
>> For many applications, if you need storage performance outside of the
>> main chassis, you could also look at an approach like Microsoft take with
>> the Fast Track Data Warehouse Reference Architecture [1]. For those who
>> don't want to read, you line up the ability of your CPUs to process data
>> with the ability of your disks to produce data. For SQL Server, you assume
>> ~300MB/s of processing per core. Core count * 300MB/s = total combined disk
>> speed. It's easy to use something like a Dell MD1220 or an HP MSA to get
>> this kind of performance, too, without breaking the bank and upgrading to
>> something like a 3PAR or EMC.
>>
>>
>> [1]:
>> http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/reference-architecture.aspx
>> [2]:
>> http://www.droboworks.com/B1200i.asp?gclid=CPbhhL2T-bkCFeI-Mgod0hEAaA
>>
>>
>>>
>>>
>>>
>>> On Wed, Oct 2, 2013 at 2:18 AM, Jeremiah Peschka <
>>> jeremiah.peschka at gmail.com> wrote:
>>>
>>>> Could you do it? Sure.
>>>>
>>>> Should you do it? No.
>>>>
>>>> An advantage of Riak is that you can avoid the cost of SAN storage by
>>>> getting duplication at the machine level rather than rely on your storage
>>>> vendor to provide it.
>>>>
>>>> Running Riak on a SAN also exposes you to the SAN becoming your
>>>> bottleneck; you only have so many fiber/iSCSI ports and a fixed number of
>>>> disks. The risk of storage contention is high, too, so you can run into
>>>> latency issues that are difficult to diagnose without looking into both
>>>> Riak as well as the storage system.
>>>>
>>>> Keeping cost in mind, too, SAN storage is about 10x the cost of
>>>> consumer grade SSDs. Not to mention feature licensing and support... The
>>>> cost comparison isn't favorable.
>>>>
>>>> Please note: Even though your vendor calls it a SAN, that doesn't mean
>>>> it's a SAN.
>>>>  On Oct 1, 2013 11:08 PM, "Guy Morton" <Guy.Morton at bksv.com> wrote:
>>>>
>>>>> Does this make sense?
>>>>>
>>>>> --
>>>>> Guy Morton
>>>>> Web Development Manager
>>>>> Brüel & Kjær EMS
>>>>>
>>>>> This e-mail is confidential and may be read, copied and used only by
>>>>> the intended recipient. If you have received it in error, please contact
>>>>> the sender immediately by return e-mail. Please then delete the e-mail and
>>>>> do not disclose its contents to any other person.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
/* Sincerely
--------------------------------------------------------------
Pedram Nimreezi - Chief Technology Officer  */

// The hardest part of design … is keeping features out. - Donald Norman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20131002/51e18761/attachment.html>


More information about the riak-users mailing list