Number of nodes and links

Alexander Sicular siculars at gmail.com
Sat May 15 18:59:16 EDT 2010


Hi Chris,

Some things I keep in mind re. links:

-not specifically the total number of links but the total size in kb should be considered specifically in regards to the webmachine http interface. i think there are problems around 8KB.
-links are unidirectional. a > b != b > a
-links are singular 1 to 1 relationships

Regarding changing links... Riak does not have any multi transaction transactional semantics. Meaning each operation is atomic in and of itself and can not be made to be dependent on another operation. The catch to this rule is pre and post hooks which will trigger at the bucket level and not at the key level so you could do certain things. So if you have an architecture where links in one object need to be reflected in another object then you either need to account for that in your application or augment your application with some sort of message queue.

Regarding changing objects (keys) frequently you should familiarize yourself with the W and DW parameters which are all beholden to your N val set at the bucket level.

I would probably run multiple nodes on one physical machine. As with most databases, the bottleneck is usually in the I/O so keep that in mind contention wise. But be warned that as per some recent posts, Riak keeps multiple physical copies of your data on multiple vnodes. And there are some number of vnodes associated with physical nodes. So your data on disk will reflect that redundancy.

To the Basho people, it wasn't clear, but are there ways to ensure that duplicate copies are not housed on the same physical host?

-Alexander

On May 15, 2010, at 6:05 PM, Chris Hicks wrote:

> For the project I am working on there are going to be a lot of individual objects, eventually into the millions, with the vast majority of them being linked as (poorly) depicted below.
> 
> Overly simplified data model:
> 
>                 B <==========>B
>          ____|___                          ___|___
>         /               \                     /              \
>       C                 C                C                C
>   /  /  \  \         /  /  \  \         /  /  \  \         /  /  \  \
>  |  |   |   |       |  |   |   |       |  |   |   |       |  |   |   |
> D D D D     D D D D     D D D D     D D D D
> 
> Where some of the B level objects (each level separated into different buckets) are linked to each other but in very limited numbers. In some of these areas there will be a ton of rapid updates and in others much more rare updates. Since a decent portion of my data modifications involve nothing more than shifting which C level object a D level object is associated with, is there anything I should keep in mind when planning for a lot of link-changing operations? For example, though for some reason I can't find it now, I remember reading that the amount of links one could have per object was something like 170K links, is that correct? I understand performance would degrade quite a bit when one has that amount of data for a single object and my project won't call for anywhere near that but just want to understand the nuances of the whole process.
> 
> Also, sort of related, I plan on running this whole thing on a single dedicated server machine (unless I get major usage and get the money to upgrade) that will have multiple CPU's. Should I just operate one physical node on that machine or should I match the number of CPU's with the number of nodes, essentially dedicating each physical CPU to handling a hardware node (if that is possible)? What would the pros and cons be to a single or multi-node system on that sort of hardware?
> 
> Chris Hicks.
> 
> The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started._______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list