object sizes

Alex De la rosa alex.rosa.box at gmail.com
Mon Apr 20 17:47:41 EDT 2015


Hi Brett,

Yeah, that was my assumption too, an overhead on RAM memory for creating
the object structures, etc... that's also why the simple objects (raw
binary) gives a pretty accurate measure compared to cURL, but
maps/sets/etc... don't.

Exactly, I would like to be able to have a way to know how big is the
object stored inside Riak (using the python client instead of doing extra
cURL calls) so I can make sure objects not bigger than 1MB in storage space
is getting saved (and then implement some kind of key split mechanism if
arriving to the limit).

Thanks!
Alex

On Mon, Apr 20, 2015 at 11:42 PM, Brett Hazen <brett at basho.com> wrote:

> Alex -
>
> Looks like Matt created a GitHub issue to track this.
> https://github.com/basho/riak-python-client/issues/403 Thanks!
>
> It occurs to me that sys.getsizeof() returns the size of the Python Riak
> Object stored in memory which is most certainly not exactly the same as
> what curl reports.  Curl is measuring the JSON across the wire and the
> Python client is converting it into a native format.  There is extra
> information in memory such as indexes into dictionaries and CRDT metadata
> used in maps.
>
> Just to clarify, you want to know the size of the object stored in Riak as
> opposed to in memory, right?  The 1MB limit is on Riak storage?
>
> thanks,
> Brett
>
> On April 17, 2015 at 2:41:56 PM, Alex De la rosa (alex.rosa.box at gmail.com)
> wrote:
>
> Hi Matthew,
>
> I don't have a github account so seems i'm not able to create the ticket
> for this feature, could you do it?
>
> Thanks,
> Alex
>
> On Thu, Apr 16, 2015 at 10:08 PM, Alex De la rosa <alex.rosa.box at gmail.com
> > wrote:
>
>> Hi Matthew,
>>
>> Thanks for your answer : ) i always have interesting questions : P
>>
>> about point [2]... if you see my examples, i'm already using
>> sys.getsizeof()... but sizes are not so accurate, also, I believe that is
>> the size they take on RAM when loaded by Python and not the full exact size
>> of the object (specially on Maps that differs quite some).
>>
>> I will open the ticket then : ) I think it can be very helpful future
>> feature.
>>
>> Thanks,
>> Alex
>>
>> On Thu, Apr 16, 2015 at 10:03 PM, Matthew Brender <mbrender at basho.com>
>> wrote:
>>
>>> Hi Alex,
>>>
>>> That is an interesting question! I haven't seen a request like that in
>>> our backlog, so feel free to open a new issue [1]. I'm curious: why
>>> not use something like sys.getsizeof [2]?
>>>
>>> [1] https://github.com/basho/riak-python-client/issues
>>> [2]
>>> http://stackoverflow.com/questions/449560/how-do-i-determine-the-size-of-an-object-in-python
>>>
>>> Matt Brender | Developer Advocacy Lead
>>> Basho Technologies
>>> t: @mjbrender
>>>
>>>
>>> On Mon, Apr 13, 2015 at 7:26 AM, Alex De la rosa
>>>  <alex.rosa.box at gmail.com> wrote:
>>> > Hi Bryan,
>>> >
>>> > Thanks for your answer; i don't know how to code in erlang, so all my
>>> system
>>> > relies on Python.
>>> >
>>> > Following Ciprian's curl suggestion, I tried to compare it with this
>>> python
>>> > code during the weekend:
>>> >
>>> > Map object:
>>> > curl -I
>>> >> 1058 bytes
>>> > print sys.getsizeof(obj.value)
>>> >> 3352 bytes
>>> >
>>> > Standard object:
>>> > curl -I
>>> >> 9718 bytes
>>> > print sys.getsizeof(obj.encoded_data)
>>> >> 9755 bytes
>>> >
>>> > The standard object seems pretty accurate in both approaches even the
>>> image
>>> > binary data was only 5kbs (I assume some overhead here)
>>> >
>>> > The map object is about 3x the difference between curl and getting the
>>> > object via Python.
>>> >
>>> > Not so sure if this is a realistic way to measure their growth
>>> (moreover
>>> > because the objects i would need this monitorization are Maps, not
>>> unaltered
>>> > binary data that I can know the size before storing it).
>>> >
>>> > Would it be possible in some way that the Python get() function would
>>> return
>>> > something like "obj.content-lenght" returning the size is currently
>>> taking?
>>> > that would be a pretty nice feature.
>>> >
>>> > Thanks!
>>> > Alex
>>> >
>>> > On Mon, Apr 13, 2015 at 12:47 PM, bryan hunt <bhunt at basho.com> wrote:
>>> >>
>>> >> Alex,
>>> >>
>>> >>
>>> >> Maps and Sets are stored just like a regular Riak object, but using a
>>> >> particular data structure and object serialization format. As you have
>>> >> observed, there is an overhead, and you want to monitor the growth of
>>> these
>>> >> data structures.
>>> >>
>>> >> It is possible to write a MapReduce map function (in Erlang) which
>>> >> retrieves a provided object by type/bucket/id and returns the size of
>>> it's
>>> >> data. Would such a thing be of use?
>>> >>
>>> >> It would not be hard to write such a module, and I might even have
>>> some
>>> >> code for doing so if you are interested. There are also reasonably
>>> good
>>> >> examples in our documentation -
>>> >> http://docs.basho.com/riak/latest/dev/advanced/mapreduce
>>> >>
>>> >> I haven't looked at the Python PB API in a while, but I'm reasonably
>>> >> certain it supports the invocation of MapReduce jobs.
>>> >>
>>> >> Bryan
>>> >>
>>> >>
>>> >> On 10 Apr 2015, at 13:51, Alex De la rosa <alex.rosa.box at gmail.com>
>>> wrote:
>>> >>
>>> >> Also, I forgot, i'm most interested on bucket_types instead of simple
>>> riak
>>> >> buckets. Being able how my mutable data inside a MAP/SET has grown.
>>> >>
>>> >> For a traditional standard bucket I can calculate the size of what I'm
>>> >> sending before, so Riak won't get data bigger than 1MB. Problem arise
>>> in
>>> >> MAPS/SETS that can grown.
>>> >>
>>> >> Thanks,
>>> >> Alex
>>> >>
>>> >> On Fri, Apr 10, 2015 at 2:47 PM, Alex De la rosa <
>>> alex.rosa.box at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Well... using the HTTP Rest API would make no sense when using the PB
>>> >>> API... would be extremely costly to maintain, also it may include
>>> some extra
>>> >>> bytes on the transport.
>>> >>>
>>> >>> I would be interested on being able to know the size via Python
>>> itself
>>> >>> using the PB API as I'm doing.
>>> >>>
>>> >>> Thanks anyway,
>>> >>> Alex
>>> >>>
>>> >>> On Fri, Apr 10, 2015 at 1:58 PM, Ciprian Manea <ciprian at basho.com>
>>> wrote:
>>> >>>>
>>> >>>> Hi Alex,
>>> >>>>
>>> >>>> You can always query the size of a riak object using `curl` and the
>>> REST
>>> >>>> API:
>>> >>>>
>>> >>>> i.e. curl -I <riak-node-ip>:8098/buckets/test/keys/demo
>>> >>>>
>>> >>>>
>>> >>>> Regards,
>>> >>>> Ciprian
>>> >>>>
>>> >>>> On Thu, Apr 9, 2015 at 12:11 PM, Alex De la rosa
>>> >>>> <alex.rosa.box at gmail.com> wrote:
>>> >>>>>
>>> >>>>> Hi there,
>>> >>>>>
>>> >>>>> I'm using the python client (by the way).
>>> >>>>>
>>> >>>>> obj = RIAK.bucket('my_bucket').get('my_key')
>>> >>>>>
>>> >>>>> Is there any way to know the actual size of an object stored in
>>> Riak?
>>> >>>>> to make sure something mutable (like a set) didn't added up to
>>> more than 1MB
>>> >>>>> in storage size.
>>> >>>>>
>>> >>>>> Thanks!
>>> >>>>> Alex
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> riak-users mailing list
>>> >>>>> riak-users at lists.basho.com
>>> >>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >> _______________________________________________
>>> >> riak-users mailing list
>>> >> riak-users at lists.basho.com
>>> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >>
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > riak-users mailing list
>>> > riak-users at lists.basho.com
>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >
>>>>>>
>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20150420/aab0ed32/attachment-0002.html>


More information about the riak-users mailing list