Two quick questions about X-RIak-Meta-* headers....

Dmitri Zagidulin dzagidulin at
Tue Dec 8 12:33:58 EST 2015

You're right, I think the Python client doesn't support the HEAD /
metadata-only request.

I'm still curious, however - what will you do with the object metadata,
assuming you can code around the lack of support in the client?

On Tue, Dec 8, 2015 at 10:02 AM, Joe O <jo7173 at> wrote:

> Damien and Dmitri – Thanks for the guidance. This really helps me out.
> I see from the link Dmitri provided that a PB fetch object call does
> indeed support a request for returning only metadata. Unfortunately, I do
> not think this functionality is exposed in the Riak Python API
> RiakBucket.get method, according to the docs at
> . I
> looked in the Python Riak client source code in GitHub (, and
> also did not see a HEAD implementation in ./transports/http/
> Am I looking in the right place? If it were implemented, it would be a
> function of the ‘get' method on the bucket object, right?
> I did test the http HEAD method (albeit with curl’s flawed –X HEAD
> implementation) and that did work.
> I can live with internal cluster bandwidth being used for a metadata
> request. I don’t want to send large objects back down to the client that
> aren’t being used.
> I understand the strategy of storing metadata for a given key in a
> different bucket using the same key. I’m trying to avoid turning my
> key-value store into a key-key-value-value store. There is an elegance to
> storing both the data and the metadata at the same time and in the same
> place via the same operation, so that is the perferred direction.
> From: Damien Krotkine
> Date: Tuesday, December 8, 2015 at 12:35 AM
> To: Dmitri Zagidulin
> Cc: "technology at", riak-users
> Subject: Re: Two quick questions about X-RIak-Meta-* headers....
> Hi Joe,
> First of all, what Dmitri says makes a lot of sense. From what I
> understand, you are trying to avoid wasting network bandwidth by
> transferring data where you only need the metadata of your keys. As Dmitri
> pointed out, if your replication factor is 3 (default), then Riak will
> internally query all the replica full *data* and metadata, then only return
> the metadata to the client. From a network perspective, you'll save network
> bandwidth *outside* of your cluster, but you'll use quite a lot of network
> bandwidth *inside* of your cluster. Maybe that's not an issue. But if it
> is, read on:
> 1/ first solution : as Dmitri suggested, the best approach is to decouple
> your metadata and data, if possible. If your key is "my_stuff", then have
> your data stored in the bucket "data" and the metadata stored in the bucket
> "metadata". So that you'd fetch "metadata/my_stuff", then fetch
> "data/my_stuff". This should make your life way easier, but the issue is
> that you loose the strong relationship between data and metadata. To try to
> mitigate this:
>  - when writing your data, start by writing "data/my_stuff" with
> conservative parameters ( w=3 for instance ) then wait for the successful
> write before storing the metadata. So that when the metadata is there,
> there is a very high chance that the data is there as well.
> - when updating your data: ry to be smart, like marking the metadata as
> invalid or unavailable, while you change the data underneath then update
> the metadata
> - be fault tolerant on your application: if you fetch some metadata, but
> the data is missing, retry, or wait, or gracefully fallback.
> - be fault tolerant again: when fetching some data, have it contain a
> header or an id that must match with the metadata. If it doesn't match, you
> need to wait/retry/fallback
> - if you don't want/can't handle that on the client side, it's possible to
> enrich the Riak API and have Riak do the bookeeping itself. If you're
> interested, let me know.
> 2/ second solution: you don't want / can't separate metadata and data. In
> this case, you can try to reduce the network usage
> - first, reduce the internal network usage inside the cluster. When
> querying the metadata, if you're using the PB API, you can pass "n_val" as
> parameter of your request. If you pass n_val=1, then Riak will not query
> the 3 replicas to fetch the value, instead it'll fetch only one replica,
> saving a lot of internal bandwidth.
> - second, you can have your client query one of the primary node (where
> one of the replica is) for a given key. Coupled with passing n_val=1, Riak
> will not transfer anything on the internal network. you can check out
> to find the primary nodes.
> Dmitri Zagidulin wrote:
> Hi Joe,
> 1. Yes, it's possible (with the HTTP HEAD request, or the client library
> equivalent (I'm pretty sure all the client libraries expose the 'return
> only the headers' part of the object fetch -- see the Optional Parameters
> head=true section of the PB API
> )).
> However, this is not going to be very helpful in your case. Behind the
> scenes, a HEAD request still requests all replicas of an object -- *full*
> replicas, including the value. It's just that the node coordinating the
> request drops the actual object value before returning the metadata/headers
> to the client.
> So, if you use this 'just give me the metadata' request, you're only
> saving on the cost of shipping the object value down the wire from the
> cluster to the client. But you're still incurring the cost of all 3 copies
> of the object (so, 3-4MB, in your case) being transferred over the network
> between the nodes, as a result of that HEAD request.
> 2. I don't know that there is a specific size limit on the object header
> values. However, this question is definitely a red flag -- it's very likely
> that you're trying to use the custom headers in a way that they weren't
> intended for.
> Can you describe in more detail what's your use case? (As in, what are you
> trying to store as metadata, and why would retrieving just the headers be
> useful to you.)
> Don't forget, that if you need to store a LOT of metadata on an object,
> and you can't do it within the object value itself (for example, when
> storing binary images), you can simply store a separate metadata object, in
> a different bucket, using the same key as the object.
> For example, if I'm storing my primary objects in the 'blobs' bucket, I
> can also store a JSON object with corresponding metadata in a 'blobs-meta'
> object, like so:
> /buckets/blobs/keys/blob123   -->  binary object value
> /buckets/blobs-meta/keys/blob123   --> json metadata object
> The downsides of this setup is that you're now doing 2 writes for each
> object (one to the blobs bucket, and one to the meta bucket).
> But the benefits are considerable:
> - You can store arbitrarily large metadata object (you're not abusing
> object headers by stuffing large values into them)
> - Since the metadata object is likely to be much smaller than the object
> it's referring to, you can use the metadata object to check for an object's
> existence (or to get the actual headers that you care about) without the
> cost of requesting the full giant blob.
> Dmitri
> On Mon, Nov 30, 2015 at 12:18 PM, Joe Olson <technology at>
> wrote:
>> Two quick questions about X-RIak-Meta-* headers:
>> 1. Is it possible to pull the headers for a key without pulling the key
>> itself? The reason I am interested in this is because the values for our
>> keys are in the 1.2-1.6 MB range, so the headers are a lot smaller in
>> comparison. I know I can index the headers using Solr or 2i, but I am
>> trying to involve the overhead of doing that.
>> 2. What are the size limits on the headers values that are strings?
>> As always, thanks in advance!
>> _______________________________________________
>> riak-users mailing list
>> riak-users at
> _______________________________________________
> riak-users mailing listriak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list