Advice on making a Riak middleware easy to configure

Dmitri Zagidulin dzagidulin at basho.com
Tue Mar 17 09:14:29 EDT 2015


Hi Marc,

This sounds like a very cool project! I'd be very interested in hearing
more about this, and answering any data modeling or setup questions.

In order to answer the setup questions specifically, we'd need to know more
about what the project is intending to do. Will users be typically
installing their own Riak clusters and then setting up apiman to help
manage APIs? Or is this more of a multi-tenant kind of situation, where
apiman would be spinning up nodes or clusters for users? To put it another
way, how does apiman handle ElasticSearch?

Couple of thoughts, from your questions.

> To be more concrete, should I, for example, expect the user to have
> already set up and joined together their Riak cluster a priori, with
> everything behind a load-balancer: just give me a single URI to connect
> to). [Or attempt to join them into a cluster].

Ah, ok, if I understand your question correctly -- if you're not spinning
up VMs or setting up the nodes yourself via ssh (using something like our
Ansible playbook), then you can expect an already set up cluster. (FWIW,
the various configuration management tools such as Ansible that install
Riak clusters do provide idempotency). I can't really picture a situation
where users would set up nodes but not join them and leave that up to
apiman.

> As far as I can tell, there is no node discovery/sharing
> implementation

If you know the IP of one node, you can definitely discover the other nodes
via an HTTP call to /stats
http://docs.basho.com/riak/latest/ops/running/nodes/inspecting/ (via
'ring_members'). But, unless apiman provides some sort of monitoring or
keepalive-checking capability, I don't think there's any reason to do that.

A load balancer is crucial (we recommend either a hardware based one, or
something like HAProxy or Nginx). I know some users connect to a Riak
cluster using the round-robin load balancing built into a Riak client, but
that should be a last resort measure (if, for example, you're not allowed
to spin up another machine for HAProxy). A dedicated load balancer (with a
least-connection load balancing algorithm) is significantly faster. (Not to
mention, provides logging and a rich ecosystem of tools and dashboards).

> Given the introduction of Riak
> Data Types on buckets, whom should I expect to set up the data types?

There isn't currently an API to create bucket types remotely. So unless
apiman has daemons that will be running on the individual Riak nodes and
can make commandline calls, you will have to leave bucket type creation to
the users.

That said, I could easily see you requiring a certain set of bucket types
of your users.

For example, Strongly Consistent buckets are useful for atomic operations
like user password management, security group management and so on. So, you
could require that users would create a bucket type named 'sc' and enable
Strong Consistency on it. (Any buckets under that bucket type would then
also be strongly consistent, and usable by apiman or by the users' client
code).

Similarly, given that metering is a goal, you would also need bucket types
for the various server-side Data Types. That is, require users to create a
Maps bucket type named 'maps', a Counters type named 'counters', and a Sets
type named 'sets', for example.

Other things to keep on your radar, as far as bucket types:

* You can attach a Solr Search index to a bucket type. However, given that
you can only associate a single search index with a bucket type, this isn't
as generic/reusable as Data Types. I could see setting up a Search index
for something like API logging, though.

* You probably want provisions for Riak Authentication & Authorization (
http://docs.basho.com/riak/2.0.4/ops/running/authz/ ). (Specifically, for
supporting user-created users & passwords, since at the moment we don't
have a remote API to manage these).

> I'm very interested to know to present a convenient set of options that
> will allow a typical development and deployment environment to be
supported.

In terms of options, do you mean like best-practice/recommended riak config
files that you'd point your users to?

Let me know if you have further questions.

Dmitri




On Sat, Mar 7, 2015 at 10:35 AM, Marc Savy <msavy at redhat.com> wrote:

> Hi All,
>
> I'm involved in a FOSS API management project (apiman), and I've been
> thinking about providing a Riak implementation of its gateway components
> in the community (where we already have ElasticSearch and Infinispan).
> These components provide the distributed storage for tasks like
> rate-limiting counters, IP white-listing, black-listing, etc and are
> applied by a horizontally scalable, async gateway (to vastly
> oversimplify!).
>
> I'm in need of advice principally in regards to configuration and
> set-up. Namely, what assumptions can I safely make about a Riak user's
> set-up, and which settings I should expose in the component's
> configuration. Note that many gateways can exist, and hence any set-up
> ideally needs to already in advance, or be idempotent in case multiple
> nodes attempt to do it at once (or otherwise for it to be
> lockable/exclusionary).
>
> To be more concrete, should I, for example, expect the user to have
> already set up and joined together their Riak cluster a priori, with
> everything behind a load-balancer: just give me a single URI to connect
> to). Or, should I expect a list of FQDNs/IPs and attempt to join them
> together into a cluster on the user's behalf - or will there be
> idempotence issues if I do that multiple times?
>
> As far as I can tell, there is no node discovery/sharing
> implementation[1], so I take it there's no way, for instance, to hit a
> single node (which has already been joined with other nodes), and
> thereby automatically gain knowledge of all cluster members?
>
> A couple of other configuration issues: Given the introduction of Riak
> Data Types on buckets, whom should I expect to set up the data types[2]?
> Should I create them automatically if they don't exist? Same for the
> bucket itself.
>
> I'm very interested to know to present a convenient set of options that
> will allow a typical development and deployment environment to be
> supported.
>
> Regards,
> Marc
>
> [0] With the usual consistency limitations
> [1] https://github.com/basho/riak/issues/356
> [2] http://docs.basho.com/riak/latest/dev/using/data-types/#
> Setting-Up-Buckets-to-Use-Riak-Data-Types
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20150317/cea20c48/attachment-0002.html>


More information about the riak-users mailing list