Social network data / Graph properties of Riak

Jeroen van Dijk jeroentjevandijk at
Sat Nov 19 15:29:51 EST 2011

On Fri, Nov 18, 2011 at 9:28 PM, Aphyr <aphyr at> wrote:

> On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:
>> And I also didn't include the riak user list for this reply:
>> On Fri, Nov 18, 2011 at 7:04 PM, Aphyr <aphyr at
>> <mailto:aphyr at>> wrote:
>>    Depending on whether you think it will be more efficient to store
>>    the graph or its dual, consider each node a vertex and write the
>>    adjacency list as a part of its data. You can store whatever
>>    weights, etc. you need on the edges there.
>>    Don't use links; they're just a thin layer on top of mapreduce, so
>>    there's really not much advantage to using them. Links are subject
>>    to the HTTP header lengths too, so storing more than a thousand on
>>    any node is likely to break down.
>> Thank you for this suggestion. Also thanks for the warning on not using
>> links for what I want. So you are saying each vertex will have a list of
>> the other vertices that it is connected to? And save each edge as
>> key/value pair? Or are you saying each vertex should embed the adjacent
>> edges, meaning duplicated edges?
> Depends on whether you want to store the graph or its dual. If you're
> dealing with a sparse DAG where the input keys are likely to be vertex
> names, the natural choice is to store each vertex as an object in riak and
> each outbound edge from it as a property of that vertex.
> /users/user1:
>  owns: [item1, ...]
>  follows: [user2, ...]
> Of course, this method doesn't work well when you need to follow edges in
> reverse, so you may need to store the reciprocal relationship on target
> nodes:
> /items/item1:
>  owner: user1,
> /users/user2:
>  followed-by: [user1, ...]
> and so forth. That requires two writes and potentially lossy conflict
> resolution strategies, e.g. a follows b but b is not followed by a. We use
> processes which walk the graph continuously and enforce relationship
> integrity as they go. We have a social graph of several hundred million
> objects stored this way in Riak, and it works reasonably well.
> Naturally, you'll want to choose an encoding which is fast and
> space-efficient for a large dataset. Depending on your needs, JSON,
> protocol buffers, or fixed-length record entries might be work well.
> Also consider the depth of traversals you'll need to perform. It may be
> best to use a graph database like neo4j for deep traversals. You could
> store your primary dataset in Riak and replicate only the important graph
> information to a graph DB via post-commit hooks. That would solve the
> reciprocal consistency problem (at the cost of replication lag), and could
> reduce the amount of data you need to put into the graph DB.
> Given that Linkedin has this problem, you might look into their tools as
> well.
Thanks for your detailed reply. I'm very happy to hear that you are using
Riak for a similar use case. My dataset will be of a similar magnitude. It
will be a graph where edges have properties (e.g. user 1 rates item x with
2 stars), but I think saving these edges as separate objects will work.
I'll use post-commit hooks to get all this data in a graph database
(probably Neo4j) and use this for graph traversing.

A side question Kyle, how do you make sure the graph database is always
(eventually) consistent with Riak? E.g. how do you correct potential errors
during the post-commit hooks for instance? And say this database gets
corrupt how do you recover this graph database and get it in sync again
with Riak?


> --Kyle
>  I'm guessing you mean the former, because that makes sense to me. So you
>> would save a graph assuming users and items like the following key/value
>> pairs:
>> //Vertices
>> user1: properties
>> user2: properties
>> item1: properties
>> //Edges
>> user1-owns-item1: properties
>> user1-follows-user2: properties
>> user2-follows-user1: properties
>> To be able to find the available edges, each vertices would need to
>> reference the keys of the edges. Is this what you mean?
>> If so, one more question about a possible problem. Say I have an item
>> with many many outgoing edges, so it needs to embed these references.
>> This would make it really costly to fetch this item from Riak I assume,
>> even if you are only interested in normal properties. Wouldn't that mean
>> you will have to save the properties seperately from the edges
>> references to it feasible?
>> Did I grasp what you were proposing Kyle?
>> Thanks,
>> Jeroen
>>    --Kyle
>>    On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:
>>        Hi all,
>>        I'm currently evaluating whether Riak would fit as the main
>>        storage of
>>        my current project, a social network. The reason I am attracted
>>        to Riak
>>        and less to a Graph database as main storage is that I want the
>> easy
>>        horizontal scalability and multi-site replication that Riak
>>        provides.
>>        The only thing I doubt is whether the key-value/link model of
>>        Riak is
>>        flexible enough to be able to store a property graph
>>        (**2361<>
>>        <**2361 <>>).
>> I am not asking whether the
>>        querying/graph traversing will be easy; I'm probably going to use a
>>        graph database or a Pregel like platform (e.g.
>> for that problem. I guess my main
>>        question
>>        is whether it would be easy/feasible to import and export a
>> property
>>        graph in and from Riak? Has someone done this before?
>>        I realize the above might be too specific, so here are two more
>>        questions that I think are relevant:
>>        - Is there a known upper limit of links that can be stored (I
>>        don't want
>>        to add them all at once so 1000 per request is fine,
>>        <**pipermail/riak-users_lists.**
>> >)
>>        - Is there a way to add meta data to links (edges)? E.g. weigths
>> and
>>        other attributes.
>>        Any other ideas or advise are also highly appreciated.
>>        Cheers,
>>        Jeroen
>>        ______________________________**___________________
>>        riak-users mailing list
>>        riak-users at <mailto:riak-users at lists.**<riak-users at>
>> >
>>        <**mailman/listinfo/riak-users_**
>> >
>> ______________________________**_________________
>> riak-users mailing list
>> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list