Social network data / Graph properties of Riak
Jeroen van Dijk
jeroentjevandijk at gmail.com
Sat Nov 19 15:29:51 EST 2011
On Fri, Nov 18, 2011 at 9:28 PM, Aphyr <aphyr at aphyr.com> wrote:
> On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:
>> And I also didn't include the riak user list for this reply:
>> On Fri, Nov 18, 2011 at 7:04 PM, Aphyr <aphyr at aphyr.com
>> <mailto:aphyr at aphyr.com>> wrote:
>> Depending on whether you think it will be more efficient to store
>> the graph or its dual, consider each node a vertex and write the
>> adjacency list as a part of its data. You can store whatever
>> weights, etc. you need on the edges there.
>> Don't use links; they're just a thin layer on top of mapreduce, so
>> there's really not much advantage to using them. Links are subject
>> to the HTTP header lengths too, so storing more than a thousand on
>> any node is likely to break down.
>> Thank you for this suggestion. Also thanks for the warning on not using
>> links for what I want. So you are saying each vertex will have a list of
>> the other vertices that it is connected to? And save each edge as
>> key/value pair? Or are you saying each vertex should embed the adjacent
>> edges, meaning duplicated edges?
> Depends on whether you want to store the graph or its dual. If you're
> dealing with a sparse DAG where the input keys are likely to be vertex
> names, the natural choice is to store each vertex as an object in riak and
> each outbound edge from it as a property of that vertex.
> owns: [item1, ...]
> follows: [user2, ...]
> Of course, this method doesn't work well when you need to follow edges in
> reverse, so you may need to store the reciprocal relationship on target
> owner: user1,
> followed-by: [user1, ...]
> and so forth. That requires two writes and potentially lossy conflict
> resolution strategies, e.g. a follows b but b is not followed by a. We use
> processes which walk the graph continuously and enforce relationship
> integrity as they go. We have a social graph of several hundred million
> objects stored this way in Riak, and it works reasonably well.
> Naturally, you'll want to choose an encoding which is fast and
> space-efficient for a large dataset. Depending on your needs, JSON,
> protocol buffers, or fixed-length record entries might be work well.
> Also consider the depth of traversals you'll need to perform. It may be
> best to use a graph database like neo4j for deep traversals. You could
> store your primary dataset in Riak and replicate only the important graph
> information to a graph DB via post-commit hooks. That would solve the
> reciprocal consistency problem (at the cost of replication lag), and could
> reduce the amount of data you need to put into the graph DB.
> Given that Linkedin has this problem, you might look into their tools as
Thanks for your detailed reply. I'm very happy to hear that you are using
Riak for a similar use case. My dataset will be of a similar magnitude. It
will be a graph where edges have properties (e.g. user 1 rates item x with
2 stars), but I think saving these edges as separate objects will work.
I'll use post-commit hooks to get all this data in a graph database
(probably Neo4j) and use this for graph traversing.
A side question Kyle, how do you make sure the graph database is always
(eventually) consistent with Riak? E.g. how do you correct potential errors
during the post-commit hooks for instance? And say this database gets
corrupt how do you recover this graph database and get it in sync again
> I'm guessing you mean the former, because that makes sense to me. So you
>> would save a graph assuming users and items like the following key/value
>> user1: properties
>> user2: properties
>> item1: properties
>> user1-owns-item1: properties
>> user1-follows-user2: properties
>> user2-follows-user1: properties
>> To be able to find the available edges, each vertices would need to
>> reference the keys of the edges. Is this what you mean?
>> If so, one more question about a possible problem. Say I have an item
>> with many many outgoing edges, so it needs to embed these references.
>> This would make it really costly to fetch this item from Riak I assume,
>> even if you are only interested in normal properties. Wouldn't that mean
>> you will have to save the properties seperately from the edges
>> references to it feasible?
>> Did I grasp what you were proposing Kyle?
>> On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:
>> Hi all,
>> I'm currently evaluating whether Riak would fit as the main
>> storage of
>> my current project, a social network. The reason I am attracted
>> to Riak
>> and less to a Graph database as main storage is that I want the
>> horizontal scalability and multi-site replication that Riak
>> The only thing I doubt is whether the key-value/link model of
>> Riak is
>> flexible enough to be able to store a property graph
>> <http://arxiv.org/abs/1006.**2361 <http://arxiv.org/abs/1006.2361>>).
>> I am not asking whether the
>> querying/graph traversing will be easy; I'm probably going to use a
>> graph database or a Pregel like platform (e.g.
>> http://www.goldenorbos.org/) for that problem. I guess my main
>> is whether it would be easy/feasible to import and export a
>> graph in and from Riak? Has someone done this before?
>> I realize the above might be too specific, so here are two more
>> questions that I think are relevant:
>> - Is there a known upper limit of links that can be stored (I
>> don't want
>> to add them all at once so 1000 per request is fine,
>> - Is there a way to add meta data to links (edges)? E.g. weigths
>> other attributes.
>> Any other ideas or advise are also highly appreciated.
>> riak-users mailing list
>> riak-users at lists.basho.com <mailto:riak-users at lists.**basho.com<riak-users at lists.basho.com>
>> riak-users mailing list
>> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users