Using Riak for Data with many Entities and Relationships
cmeiklejohn at basho.com
Wed Mar 4 09:48:46 EST 2015
> On Feb 26, 2015, at 5:41 PM, Matt Brooks <mtbrooks33 at gmail.com> wrote:
> I am designing a web application that, for the purpose of this conversation, deals with three main entities:
> • Users
> • Groups
> • Tasks
> Users are members of groups, and tasks belong to groups.
> Early in the development of the application, Neo4j was used to store the data. Users would have a MEMBER_OF relationship to a group, and tasks would have a BELONGS_TO relationship to a group. Neo4j was nice for access control because I could add permissions to the MEMBER_OF relationship. It was also nice for the simple BELONGS_TO relationship. Neo4j separates entites and relationships nicely.
> After reading about Riak and reminiscing about my use of MongoDB in the past, I began to think about using Riak to store my data instead of Neo4j. Storing the users, groups, and tasks seems trivial enough. But storing the relationships seems a bit tougher.
> I am planning on storing the entities in three buckets:
> • user
> • group
> • task
> ...where each of the buckets has the entity's ID as the key and a map of the relevant information as the value.
> What I am struggling with now is modeling the relationships I so easily modeled in Neo4j, in Riak. I have a few ideas:
> • Store both user IDs and task IDs in lists inside of the group information. The user ID list would also include permissions for the users.
> • Store group IDs in a list inside of the user information and task IDs in a list inside of the group information.
> • Use a user-group bucket and a group-task bucket. The user-group bucket will have user IDs as the keys and a list of maps as the value. The maps in question would hold a group ID and permission information for the group. The group-task bucket would be similar to the user-group bucket, but instead of a list of maps, it would simply have a list of task IDs.
> • Use Riak's links for both user membership and tasks belonging to groups. A given user would have member links to groups, and a given group would have task links to tasks. Permissions for a given user ID would be stored in the group somewhere.
> None of the four entirely satisfy me..
> Number one makes it really hard and inefficient to ask the DB for the groups that a user is a member of (I would have to go through every single group and check if the user ID is in the member list). The same issue occurs with tasks.
> Number two makes it really easy to go from user to group to tasks, but makes it difficult to go from group back to users. What if I wanted to ask "what users are members of group X?".
> Number three works in a way similar to relational databases, and does a good job of separating relationships from entities. This has the same issues mentioned in number two.
> Number four seems to be the one that might be considered idiomatic Riak usage, but we completely separate permissions from the member relationship a user has with a group due to links not supporting complex properties.
My apologies for the delay in my response.
Link walking, as in first-class support for them, is deprecated. In most clients today, link walking is implemented through a MapReduce job which uses the object’s metadata to traverse the graph. Given this relies on MapReduce, we don’t recommend using this in production.
I’d probably do something similar to number 2, however, I’d probably try to use the Riak Map Data Type, which models a convergent server-side dictionary, to model each user. [1, 2] In this object, you could store a list of group identifiers for each user, and use Riak Search to perform the group to users query, given the built in search facility can query into the Map object. 
There is one gotcha with most of these design patterns: given Riak doesn’t provide a causal consistency guarantee, you may run into a scenario where you read a user object that contains a reference to a group that you can’t read yet. This can happen for a variety of reasons: contacted a replica which didn’t receive the update yet, contacted a fallback replica because of the failure or inability to contact a primary, etc. It’s important to make sure your application can resolve these situations at read time.
Senior Software Engineer
Basho Technologies, Inc.
cmeiklejohn at basho.com
More information about the riak-users