Scaling Riak CS to hundreds or thousands of servers

Andrew Stone astone at basho.com
Sun Jul 28 01:49:34 EDT 2013


Hi Andre,

The blocks are going to be spread across some subset of that 100 servers.
Since Riak CS stores the chunks inside Riak they are hashed based on
primary key. Currently there is no way to co-locate chunks in Riak CS. You
can read more about How riak manages storage here:
http://docs.basho.com/riak/latest/theory/concepts/Clusters/let

While it is possible to change the n_val (number of times an object is
replicated) on buckets inside Riak, it is not easy or advisable to do so
with Riak CS. Additionally, even if Riak CS allowed you to set the n_val
for individual CS buckets Riak itself does not support changing the n_nval.
Riak CS generates block keys and prefixes bucket names, and is generally in
charge of how it chunks and stores your data inside Riak.

-Andrew



On Sat, Jul 27, 2013 at 12:17 PM, Andre Lohmann <lohmann.andre at gmail.com>wrote:

> Hi Again and thanks for the answers.
>
> Now I have another questions.
>
> If chunk size is always 1mb and i have a file of 100mb within a cluster of
> 100 servers. Are the chunks then split over all 100 servers or am i able to
> manage the chunks of one file to be saved at one place, to prevent to much
> network traffic?
>
> Also, there are allways files, that are fetched of high frequency and
> others that are less relevant. Is it possible to configure files that
> should be copied to more than three servers and can this setting for some
> special files be reset to a lower redundancy, after these files become less
> relevant too?
>
> Kind regards
>
> Andre
>
> Von meinem iPad gesendet
>
> Am 25.07.2013 um 16:56 schrieb John Daily <jdaily at basho.com>:
>
> > Good questions, Andre, thanks for reaching out.
> >
> >> How is the Scaling of Raik CS when setting up a few hundred servers in
> a cluster?
> >
> > The upper limit of a cluster size depends on traffic load, hardware, and
> network capacity, but anything approaching 100 servers is likely to run
> into trouble due to inherent limitations in both Riak and the Erlang VM
> itself.  An Erlang cluster is a full mesh, so the cluster overhead grows
> significantly with the number of servers.
> >
> >
> >>
> >> Does it make sense, to build a cluster of that size, or is it
> recommended to have smaller pools of clusters and shard the Files over more
> than one cluster?
> >
> > Definitely should look at deploying multiple clusters.
> >
> >
> >> What about stanchion as the potetial single point of failure, how to
> prevent it from corupting the System? If I got it right, Stanchion handles
> the IDs of Buckets and some user stuff. So if no further Buckets or users
> need to be created, there is no need for stanchion at that time and
> up-/downloading can go on as usual?
> >
> > You're correct: user accounts and user buckets are the reason we need a
> consensus system, and thus the reason Stanchion exists. For existing
> users/buckets, as you indicate, files can be transferred without any
> involvement from Stanchion.
> >
> > It is possible to cluster Stanchion using traditional cluster tools.
> >
> > (Why Amazon chose to use a global namespace for buckets is a mystery
> beyond my ken.)
> >
> >>
> >> As I want to use riak cs for large files (50MB-15GB) is it possible to
> raise the chunk size up to these Filesizes, to prevent the system from
> heavy network traffic?
> >
> > Definitely not. Erlang's distribution protocol will behave badly with
> objects that large.
> >
> > -John Daily
> > Technical Evangelist
> > Basho Technologies
> >
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130728/b8cad237/attachment.html>


More information about the riak-users mailing list