brett at basho.com
Mon Apr 20 17:42:13 EDT 2015
Looks like Matt created a GitHub issue to track this. https://github.com/basho/riak-python-client/issues/403 Thanks!
It occurs to me that sys.getsizeof() returns the size of the Python Riak Object stored in memory which is most certainly not exactly the same as what curl reports. Curl is measuring the JSON across the wire and the Python client is converting it into a native format. There is extra information in memory such as indexes into dictionaries and CRDT metadata used in maps.
Just to clarify, you want to know the size of the object stored in Riak as opposed to in memory, right? The 1MB limit is on Riak storage?
On April 17, 2015 at 2:41:56 PM, Alex De la rosa (alex.rosa.box at gmail.com) wrote:
I don't have a github account so seems i'm not able to create the ticket for this feature, could you do it?
On Thu, Apr 16, 2015 at 10:08 PM, Alex De la rosa <alex.rosa.box at gmail.com> wrote:
Thanks for your answer : ) i always have interesting questions : P
about point ... if you see my examples, i'm already using sys.getsizeof()... but sizes are not so accurate, also, I believe that is the size they take on RAM when loaded by Python and not the full exact size of the object (specially on Maps that differs quite some).
I will open the ticket then : ) I think it can be very helpful future feature.
On Thu, Apr 16, 2015 at 10:03 PM, Matthew Brender <mbrender at basho.com> wrote:
That is an interesting question! I haven't seen a request like that in
our backlog, so feel free to open a new issue . I'm curious: why
not use something like sys.getsizeof ?
Matt Brender | Developer Advocacy Lead
On Mon, Apr 13, 2015 at 7:26 AM, Alex De la rosa
<alex.rosa.box at gmail.com> wrote:
> Hi Bryan,
> Thanks for your answer; i don't know how to code in erlang, so all my system
> relies on Python.
> Following Ciprian's curl suggestion, I tried to compare it with this python
> code during the weekend:
> Map object:
> curl -I
>> 1058 bytes
> print sys.getsizeof(obj.value)
>> 3352 bytes
> Standard object:
> curl -I
>> 9718 bytes
> print sys.getsizeof(obj.encoded_data)
>> 9755 bytes
> The standard object seems pretty accurate in both approaches even the image
> binary data was only 5kbs (I assume some overhead here)
> The map object is about 3x the difference between curl and getting the
> object via Python.
> Not so sure if this is a realistic way to measure their growth (moreover
> because the objects i would need this monitorization are Maps, not unaltered
> binary data that I can know the size before storing it).
> Would it be possible in some way that the Python get() function would return
> something like "obj.content-lenght" returning the size is currently taking?
> that would be a pretty nice feature.
> On Mon, Apr 13, 2015 at 12:47 PM, bryan hunt <bhunt at basho.com> wrote:
>> Maps and Sets are stored just like a regular Riak object, but using a
>> particular data structure and object serialization format. As you have
>> observed, there is an overhead, and you want to monitor the growth of these
>> data structures.
>> It is possible to write a MapReduce map function (in Erlang) which
>> retrieves a provided object by type/bucket/id and returns the size of it's
>> data. Would such a thing be of use?
>> It would not be hard to write such a module, and I might even have some
>> code for doing so if you are interested. There are also reasonably good
>> examples in our documentation -
>> I haven't looked at the Python PB API in a while, but I'm reasonably
>> certain it supports the invocation of MapReduce jobs.
>> On 10 Apr 2015, at 13:51, Alex De la rosa <alex.rosa.box at gmail.com> wrote:
>> Also, I forgot, i'm most interested on bucket_types instead of simple riak
>> buckets. Being able how my mutable data inside a MAP/SET has grown.
>> For a traditional standard bucket I can calculate the size of what I'm
>> sending before, so Riak won't get data bigger than 1MB. Problem arise in
>> MAPS/SETS that can grown.
>> On Fri, Apr 10, 2015 at 2:47 PM, Alex De la rosa <alex.rosa.box at gmail.com>
>>> Well... using the HTTP Rest API would make no sense when using the PB
>>> API... would be extremely costly to maintain, also it may include some extra
>>> bytes on the transport.
>>> I would be interested on being able to know the size via Python itself
>>> using the PB API as I'm doing.
>>> Thanks anyway,
>>> On Fri, Apr 10, 2015 at 1:58 PM, Ciprian Manea <ciprian at basho.com> wrote:
>>>> Hi Alex,
>>>> You can always query the size of a riak object using `curl` and the REST
>>>> i.e. curl -I <riak-node-ip>:8098/buckets/test/keys/demo
>>>> On Thu, Apr 9, 2015 at 12:11 PM, Alex De la rosa
>>>> <alex.rosa.box at gmail.com> wrote:
>>>>> Hi there,
>>>>> I'm using the python client (by the way).
>>>>> obj = RIAK.bucket('my_bucket').get('my_key')
>>>>> Is there any way to know the actual size of an object stored in Riak?
>>>>> to make sure something mutable (like a set) didn't added up to more than 1MB
>>>>> in storage size.
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>> riak-users mailing list
>> riak-users at lists.basho.com
> riak-users mailing list
> riak-users at lists.basho.com
riak-users mailing list
riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users