best practices on data storage?
jlangevin at loomlearning.com
Thu Jul 28 09:39:01 EDT 2011
I don't think that nesting objects within the current object is the best
choice every time, but it does have it's uses.
When mapping the schema for a large project recently, what I considered was:
1. How often will the nested/related objects be written to?
2. How large will the data get?
If the data is something that will grow considerably, then I definitely
store it on it's own (but grouped as much as possible, such as grouping
forum posts into a thread, and the thread gets stored as a single object).
If data will be written to quite often (such as log entries for a user),
instead of storing as nested objects within the user, I again store
separately, as I'd rather avoid write conflicts on the user object due to a
log being added while the user profile was being updated.
If the two criteria above aren't an issue, then I nest within the object as
much as possible.
And of course, use links to relate the data when appropriate (but avoid
Oh, and another possible help/solution is to create manual indexes, such as
writing to a user log index that tracks all of the log entry keys. To avoid
wasting link space, it wouldn't necessarily need to be linked, would just
need the pks stored, so then you already would have a listing of all logs
for a user. Then you could link the user log index object back to the user
Don't forget that you can use creative names for your buckets/keys to help
segment your data (at least for querying, since it's already been
established that prefixing doesn't necessarily help performance).
On avoiding excessive linking:
NOTE: There is no artificial limit to the number of links an object can
have. But, as adding links to an object does increase that object’s size,
the same guidelines that apply to your data should also apply to your links:
strike a balance between size and usability.
Wilmington, NC: (910) 241-0433 - jlangevin at loomlearning.com -
www.loomlearning.com - Skype: intel352*
On Thu, Jul 28, 2011 at 2:47 AM, Antonio Rohman Fernandez <
rohman at mahalostudio.com> wrote:
> I also thought on that... but then the "User" object could become really
> big... imagine i post 10 statuses every day and thousands of friends
> comments all the time on them... also... wouldn't be troublesome to update
> the "User" object when "friends" comments on statuses? as you should have to
> retrieve the data to insert the new comment nested and if several friends
> comment at the same times i see data getting lost in the way... or i'm
> missing something?
> On Wed, 27 Jul 2011 23:40:30 -0700, Sylvain Niles wrote:
> Hi Rohman, the conversation yesterday got us to thinking and Basho
> confirmed that buckets are a form of key prefix. So no matter how small the
> bucket it will traverse the whole key space for a map reduce. We sat down
> and did some thinking of how to work our data differently as we have a
> similar use case to you and decided on nested docs using Ripple. In our case
> we had special buckets for each user like you describe below. Now that
> bucket is a nested JSON struct inside the user object instead of a separate
> bucket. In your use case you could have all statuses as a nested struct on
> your user object and display would be a matter of linkwalking all an user's
> friends and parsing status content with some time/sorting.
> On Wed, Jul 27, 2011 at 11:26 PM, Antonio Rohman Fernandez <
> rohman at mahalostudio.com> wrote:
>> Yesterday, somebody suggested that not for having the data distributed
>> on smaller buckets, Riak's MapReduce operations would be faster... while
>> nobody at Basho confirmed that yet, i'm now wondering which is the best way
>> for storing data... lets imagine this simple excercise:
>> 1. We have entities users, friends, statuses and comments in a web app
>> 2. Users can make friends with other users
>> 3. Users can post statuses
>> 4. Friends ( Users ) can comment on user's statuses
>> At first i thought on having a bucket called "users" with all users and
>> then for friend linkage i was thinking on having personal buckets like
>> "rohman_friends", "fyodor_friends", etc... with the keys to the users
>> instead of a big "friends" bucket for easy querying... but seems i'm
>> wrong... so...
>> How would you distribute the data on buckets? and how would you run
>> MapReduce jobs? Would you use a support SQL database to store relationship
>> between keys? is possible on an only Riak environment?
>> [image: line] [image: logo] <http://mahalostudio.com> *Antonio Rohman
>> CEO, Founder & Lead Engineer
>> rohman at mahalostudio.com *Projects*
>> MaruBatsu.es <http://marubatsu.es>
>> PupCloud.com <http://pupcloud.com>
>> Wedding Album <http://wedding.mahalostudio.com> [image: line]
>> riak-users mailing list
>> riak-users at lists.basho.com
> [image: line] [image: logo] <http://mahalostudio.com> *Antonio Rohman
> CEO, Founder & Lead Engineer
> rohman at mahalostudio.com *Projects*
> MaruBatsu.es <http://marubatsu.es>
> PupCloud.com <http://pupcloud.com>
> Wedding Album <http://wedding.mahalostudio.com> [image: line]
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users