This is about riak search question. How to search utf8 format dat?

Ryan Zezeski rzezeski at
Sun Oct 28 15:10:11 EDT 2012

On Wed, Oct 10, 2012 at 12:52 AM, 郎咸武 <langxianzhe at> wrote:
> *2)To put a Object to <<"user1">> bucket. The data is utf8 format.*
> (trends at jason-lxw)123> f(O), O=riakc_obj:new(<<"user1">>,
> <<"jason5">>,list_to_binary(mochijson:encode({struct, [{name,
> binary_to_list(unicode:characters_to_binary("爱"))},{sex,"male"}]})),
> "application/json").
> {riakc_obj,<<"user1">>,<<"jason5">>,undefined,[],
>            {dict,1,16,16,8,80,48,
>                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
>  {{[],[],[],[],[],[],[],[],[],[],[[<<...>>|...]],[],[],...}}},
>            <<"{\"name\":\"\\u00e7\\u0088\\u00b1\",\"sex\":\"male\"}">>}
> (((trends at jason-lxw)124> riakc_pb_socket:put(Pid, O).
> ok
First, let's start with your data and make sure it's getting stored

3> UC = unicode:characters_to_binary("爱").

Okay, so Erlang properly decoded this into a 3-byte unicode sequence.  What
does mochijson2 think? (I noticed you are using mochison, I recommend using

4> mochijson2:encode({struct, [{name, UC}]}).

Good, mochijson2 properly interpreted this as u7231.  A quick lookup on the
web verifies this is correct:

But notice in your code you call binary_to_list on the binary before
passing it to mochi.  Lets see what happened.

15> binary_to_list(UC).

Okay, so the integers are correct.  But Erlang treats lists differently
from binaries.  It's just a list of integers to Erlang.

16> io:format("~ts~n",[binary_to_list(UC)]).

This is why mochi converted it to 3 chatacters: \\u00e7\\u0088\\u00b1

To make a proper unicode list the unicode:caracters_to_list function must
be used.

17> UCS = unicode:characters_to_list("爱").

18> io:format("~ts~n", [UCS]).

Let's try encoding again, but this time leave out the list_to_binary.

19> riakc_obj:new(<<"user1">>, <<"jason5">>, mochijson2:encode({struct,
[{name, unicode:characters_to_binary("爱")}]}), "application/json").


And there we go.  A properly encoded unicode character.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list