Social network data / Graph properties of Riak

Aphyr aphyr at aphyr.com
Fri Nov 18 15:28:37 EST 2011


On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:
> And I also didn't include the riak user list for this reply:
>
>
> On Fri, Nov 18, 2011 at 7:04 PM, Aphyr <aphyr at aphyr.com
> <mailto:aphyr at aphyr.com>> wrote:
>
>     Depending on whether you think it will be more efficient to store
>     the graph or its dual, consider each node a vertex and write the
>     adjacency list as a part of its data. You can store whatever
>     weights, etc. you need on the edges there.
>
>     Don't use links; they're just a thin layer on top of mapreduce, so
>     there's really not much advantage to using them. Links are subject
>     to the HTTP header lengths too, so storing more than a thousand on
>     any node is likely to break down.
>
>
> Thank you for this suggestion. Also thanks for the warning on not using
> links for what I want. So you are saying each vertex will have a list of
> the other vertices that it is connected to? And save each edge as
> key/value pair? Or are you saying each vertex should embed the adjacent
> edges, meaning duplicated edges?

Depends on whether you want to store the graph or its dual. If you're 
dealing with a sparse DAG where the input keys are likely to be vertex 
names, the natural choice is to store each vertex as an object in riak 
and each outbound edge from it as a property of that vertex.

/users/user1:
   owns: [item1, ...]
   follows: [user2, ...]

Of course, this method doesn't work well when you need to follow edges 
in reverse, so you may need to store the reciprocal relationship on 
target nodes:

/items/item1:
   owner: user1,

/users/user2:
   followed-by: [user1, ...]

and so forth. That requires two writes and potentially lossy conflict 
resolution strategies, e.g. a follows b but b is not followed by a. We 
use processes which walk the graph continuously and enforce relationship 
integrity as they go. We have a social graph of several hundred million 
objects stored this way in Riak, and it works reasonably well.

Naturally, you'll want to choose an encoding which is fast and 
space-efficient for a large dataset. Depending on your needs, JSON, 
protocol buffers, or fixed-length record entries might be work well.

Also consider the depth of traversals you'll need to perform. It may be 
best to use a graph database like neo4j for deep traversals. You could 
store your primary dataset in Riak and replicate only the important 
graph information to a graph DB via post-commit hooks. That would solve 
the reciprocal consistency problem (at the cost of replication lag), and 
could reduce the amount of data you need to put into the graph DB.

Given that Linkedin has this problem, you might look into their tools as 
well.

--Kyle

> I'm guessing you mean the former, because that makes sense to me. So you
> would save a graph assuming users and items like the following key/value
> pairs:
>
> //Vertices
> user1: properties
> user2: properties
> item1: properties
>
> //Edges
> user1-owns-item1: properties
> user1-follows-user2: properties
> user2-follows-user1: properties
>
> To be able to find the available edges, each vertices would need to
> reference the keys of the edges. Is this what you mean?
>
> If so, one more question about a possible problem. Say I have an item
> with many many outgoing edges, so it needs to embed these references.
> This would make it really costly to fetch this item from Riak I assume,
> even if you are only interested in normal properties. Wouldn't that mean
> you will have to save the properties seperately from the edges
> references to it feasible?
>
> Did I grasp what you were proposing Kyle?
>
> Thanks,
> Jeroen
>
>     --Kyle
>
>
>     On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:
>
>         Hi all,
>
>         I'm currently evaluating whether Riak would fit as the main
>         storage of
>         my current project, a social network. The reason I am attracted
>         to Riak
>         and less to a Graph database as main storage is that I want the easy
>         horizontal scalability and multi-site replication that Riak
>         provides.
>         The only thing I doubt is whether the key-value/link model of
>         Riak is
>         flexible enough to be able to store a property graph
>         (http://arxiv.org/abs/1006.__2361
>         <http://arxiv.org/abs/1006.2361>). I am not asking whether the
>         querying/graph traversing will be easy; I'm probably going to use a
>         graph database or a Pregel like platform (e.g.
>         http://www.goldenorbos.org/) for that problem. I guess my main
>         question
>         is whether it would be easy/feasible to import and export a property
>         graph in and from Riak? Has someone done this before?
>
>         I realize the above might be too specific, so here are two more
>         questions that I think are relevant:
>
>         - Is there a known upper limit of links that can be stored (I
>         don't want
>         to add them all at once so 1000 per request is fine,
>         http://lists.basho.com/__pipermail/riak-users_lists.__basho.com/2010-March/000786.__html
>         <http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000786.html>)
>         - Is there a way to add meta data to links (edges)? E.g. weigths and
>         other attributes.
>
>         Any other ideas or advise are also highly appreciated.
>
>         Cheers,
>
>         Jeroen
>
>
>
>         _________________________________________________
>         riak-users mailing list
>         riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>         http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
>         <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list