Using UUID as keys is problematic for Riak Search

David James davidcjames at gmail.com
Sun Aug 10 21:02:49 EDT 2014


Yes, that clarifies it -- much appreciated.


On Sun, Aug 10, 2014 at 8:43 PM, Eric Redmond <eredmond at basho.com> wrote:

> I'm at my laptop now so I can talk a bit more about it.
>
> Don't conflate the value type with the encodings. UUID is a field type,
> just like how dates or integers are field types. They explain to the Solr
> indexer how to reason about the value it gets. The field type string
> "20140810" is encoded differently than the integer value 20140810 or Date
> "20140810". This is important for the queries you can build, as a date
> range query is different than an integer or string range.
>
> That said, in Solr, usually UUID is generated on the backend, such as
> with UUIDUpdateProcessorFactory. Even so, you can no more send a binary
> UUID than you can a binary date value.
>
> There are two encodings you have to think about when dealing with Solr.
> Anything that's binary needs to be converted to a String that Solr can
> understand. Base64 is how you convert a binary value to a string value. So
> in the case of your key (in Erlang):
>
> 1>
> base64:encode(<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>).
> <<"Xo8hIy20TqSX7UhROA0c+g==">>
>
> base64 encoding libs exist in any language.
>
> Once you have this key string in base64, internally, Yokozuna will assume
> that string is valid UTF8.
>
> I was probably a bit hasty when I said "yokozuna only supports UTF8 . What
> I should have said is that "yokozuna assumes types/buckets/keys are UTF8
> and encodes values appropriately."
>
> So in summation:
>
> UUID:   Solr field type
> Base64:  Encode binary values to a string
> UTF8:  The assumed string encoding
>
> Does that help?
> Eric
>
>
> On Aug 10, 2014, at 5:03 PM, David James <davidcjames at gmail.com> wrote:
>
> Thanks for the quick responses.
>
> Eric: I don't understand. Why does Solr have the UUIDField (
> http://lucene.apache.org/solr/4_7_0/solr-core/org/apache/solr/schema/UUIDField.html)
> if it were not indexable? What is the nature of the limitation?
>
> Jason: Thanks, I will consider Base 64 encoding.
>
>
> On Sun, Aug 10, 2014 at 7:19 PM, Jason Campbell <xiaclo at xiaclo.net> wrote:
>
>> I like UUIDs for everything as well, although I expected compatibility
>> issues with something. Base 64 encoding the binary value is a nice
>> compromise for me, and takes 22 characters (if you drop the padding)
>> instead of the usual 36 for the hyphenated hex format.
>>
>> It would still require re encoding all the keys, but it's a partial
>> solutions.
>>
>>    *From: *Eric Redmond
>> *Sent: *Monday, 11 August 2014 9:15 AM
>> *To: *David James
>> *Cc: *riak-users
>> *Subject: *Re: Using UUID as keys is problematic for Riak Search
>>
>> You're correct that yokozuna only supports utf8, because the Solr
>> interface only supports utf8 (note that the failure happens when attempting
>> to build a non-utf8 JSON add document command). There's not much we can do
>> here at the moment, since we've yet to (if ever) support a custom interface
>> to Solr that accepts arbitrary binary values. In the mean time, to use
>> yokozuna, you'll have to encode your keys to utf8.
>>
>> Eric Redmond, Engineer @ Basho
>>
>> On Sun, Aug 10, 2014 at 4:01 PM, David James <davidcjames at gmail.com>
>> wrote:
>>
>> I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8
>> strings. (I'd rather spend 16 bytes for each key, not 36.)
>>
>> As I understand it, Yokozuna maps the Riak key to _yz_id.
>>
>> Here is the suggested schema from the documentation:
>>
>> <!-- schema.xml -->
>> <field name="_yz_id" type="_yz_str" indexed="true" stored="true"
>> multiValued="false" required="true"/>
>> <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true"/>
>>
>>  Would you expect this to work with Riak Search? I would hope so.
>>
>> (Or must keys be UTF-8 strings?)
>>
>> I get this error, which does not surprise me, given that the _yz_id is
>> defined as a string:
>>
>> ==> log/error.log <==
>>
>> 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index
>> object
>> {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>}
>> with error {ucs,{bad_utf8_character_code}} because
>> [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}]
>> I don't think changing the schema.xml type for _yz_id to "solr.UUIDField"
>> is a good idea.
>>
>> What can I do?
>>
>> Thanks,
>> David
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140810/a8f462bd/attachment.html>


More information about the riak-users mailing list