Social network data / Graph properties of Riak

Jeroen van Dijk jeroentjevandijk at gmail.com
Sat Nov 19 15:29:51 EST 2011


On Fri, Nov 18, 2011 at 9:28 PM, Aphyr <aphyr at aphyr.com> wrote:

> On 11/18/2011 11:50 AM, Jeroen van Dijk wrote:
>
>> And I also didn't include the riak user list for this reply:
>>
>>
>> On Fri, Nov 18, 2011 at 7:04 PM, Aphyr <aphyr at aphyr.com
>> <mailto:aphyr at aphyr.com>> wrote:
>>
>>    Depending on whether you think it will be more efficient to store
>>    the graph or its dual, consider each node a vertex and write the
>>    adjacency list as a part of its data. You can store whatever
>>    weights, etc. you need on the edges there.
>>
>>    Don't use links; they're just a thin layer on top of mapreduce, so
>>    there's really not much advantage to using them. Links are subject
>>    to the HTTP header lengths too, so storing more than a thousand on
>>    any node is likely to break down.
>>
>>
>> Thank you for this suggestion. Also thanks for the warning on not using
>> links for what I want. So you are saying each vertex will have a list of
>> the other vertices that it is connected to? And save each edge as
>> key/value pair? Or are you saying each vertex should embed the adjacent
>> edges, meaning duplicated edges?
>>
>
> Depends on whether you want to store the graph or its dual. If you're
> dealing with a sparse DAG where the input keys are likely to be vertex
> names, the natural choice is to store each vertex as an object in riak and
> each outbound edge from it as a property of that vertex.
>
> /users/user1:
>  owns: [item1, ...]
>  follows: [user2, ...]
>
> Of course, this method doesn't work well when you need to follow edges in
> reverse, so you may need to store the reciprocal relationship on target
> nodes:
>
> /items/item1:
>  owner: user1,
>
> /users/user2:
>  followed-by: [user1, ...]
>
> and so forth. That requires two writes and potentially lossy conflict
> resolution strategies, e.g. a follows b but b is not followed by a. We use
> processes which walk the graph continuously and enforce relationship
> integrity as they go. We have a social graph of several hundred million
> objects stored this way in Riak, and it works reasonably well.
>
> Naturally, you'll want to choose an encoding which is fast and
> space-efficient for a large dataset. Depending on your needs, JSON,
> protocol buffers, or fixed-length record entries might be work well.
>
> Also consider the depth of traversals you'll need to perform. It may be
> best to use a graph database like neo4j for deep traversals. You could
> store your primary dataset in Riak and replicate only the important graph
> information to a graph DB via post-commit hooks. That would solve the
> reciprocal consistency problem (at the cost of replication lag), and could
> reduce the amount of data you need to put into the graph DB.
>
> Given that Linkedin has this problem, you might look into their tools as
> well.
>
>
Thanks for your detailed reply. I'm very happy to hear that you are using
Riak for a similar use case. My dataset will be of a similar magnitude. It
will be a graph where edges have properties (e.g. user 1 rates item x with
2 stars), but I think saving these edges as separate objects will work.
I'll use post-commit hooks to get all this data in a graph database
(probably Neo4j) and use this for graph traversing.

A side question Kyle, how do you make sure the graph database is always
(eventually) consistent with Riak? E.g. how do you correct potential errors
during the post-commit hooks for instance? And say this database gets
corrupt how do you recover this graph database and get it in sync again
with Riak?

Cheers,
Jeroen









> --Kyle
>
>  I'm guessing you mean the former, because that makes sense to me. So you
>> would save a graph assuming users and items like the following key/value
>> pairs:
>>
>> //Vertices
>> user1: properties
>> user2: properties
>> item1: properties
>>
>> //Edges
>> user1-owns-item1: properties
>> user1-follows-user2: properties
>> user2-follows-user1: properties
>>
>> To be able to find the available edges, each vertices would need to
>> reference the keys of the edges. Is this what you mean?
>>
>> If so, one more question about a possible problem. Say I have an item
>> with many many outgoing edges, so it needs to embed these references.
>> This would make it really costly to fetch this item from Riak I assume,
>> even if you are only interested in normal properties. Wouldn't that mean
>> you will have to save the properties seperately from the edges
>> references to it feasible?
>>
>> Did I grasp what you were proposing Kyle?
>>
>> Thanks,
>> Jeroen
>>
>>    --Kyle
>>
>>
>>    On 11/18/2011 07:38 AM, Jeroen van Dijk wrote:
>>
>>        Hi all,
>>
>>        I'm currently evaluating whether Riak would fit as the main
>>        storage of
>>        my current project, a social network. The reason I am attracted
>>        to Riak
>>        and less to a Graph database as main storage is that I want the
>> easy
>>        horizontal scalability and multi-site replication that Riak
>>        provides.
>>        The only thing I doubt is whether the key-value/link model of
>>        Riak is
>>        flexible enough to be able to store a property graph
>>        (http://arxiv.org/abs/1006.__**2361<http://arxiv.org/abs/1006.__2361>
>>        <http://arxiv.org/abs/1006.**2361 <http://arxiv.org/abs/1006.2361>>).
>> I am not asking whether the
>>
>>        querying/graph traversing will be easy; I'm probably going to use a
>>        graph database or a Pregel like platform (e.g.
>>        http://www.goldenorbos.org/) for that problem. I guess my main
>>        question
>>        is whether it would be easy/feasible to import and export a
>> property
>>        graph in and from Riak? Has someone done this before?
>>
>>        I realize the above might be too specific, so here are two more
>>        questions that I think are relevant:
>>
>>        - Is there a known upper limit of links that can be stored (I
>>        don't want
>>        to add them all at once so 1000 per request is fine,
>>        http://lists.basho.com/__**pipermail/riak-users_lists.__**
>> basho.com/2010-March/000786.__**html<http://lists.basho.com/__pipermail/riak-users_lists.__basho.com/2010-March/000786.__html>
>>        <http://lists.basho.com/**pipermail/riak-users_lists.**
>> basho.com/2010-March/000786.**html<http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-March/000786.html>
>> >)
>>
>>        - Is there a way to add meta data to links (edges)? E.g. weigths
>> and
>>        other attributes.
>>
>>        Any other ideas or advise are also highly appreciated.
>>
>>        Cheers,
>>
>>        Jeroen
>>
>>
>>
>>        ______________________________**___________________
>>        riak-users mailing list
>>        riak-users at lists.basho.com <mailto:riak-users at lists.**basho.com<riak-users at lists.basho.com>
>> >
>>        http://lists.basho.com/__**mailman/listinfo/riak-users___**
>> lists.basho.com<http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com>
>>        <http://lists.basho.com/**mailman/listinfo/riak-users_**
>> lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>> >
>>
>>
>>
>>
>>
>>
>> ______________________________**_________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com<http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111119/1b438b63/attachment.html>


More information about the riak-users mailing list