Map/Reduce, UTF-8 and Swedish high ASCII characters

Mårten Gustafson marten.gustafson at gmail.com
Wed Feb 17 12:28:58 EST 2010


Howdy chaps!

I've been struggling back and forth with trying to run map/reduce over
one of the our datasets and I've stumbled on the error message at the
bottom of this mail. The message itself is pretty clear I think
"{bad_return_value,invalid_utf8}". The dataset is in Swedish and hence
we have a couple of high ASCII characters present, namely:

134: å
143: Å
132: ä
142: Ä
148: ö
153: Ö

The data stems from CouchDB which, when surfed to, is nicely displayed
correctly (Firefox detects UTF-8 and renders accordingly). I've then
ran it through a node.js script that extracts the data from CouchDB
and stores it in Riak.
Pointing Firefox to the Riak URL for a given entry renders it
correctly (again, UTF-8 detection). However when running my map/reduce
job it bails out in the first phase which is applying the built in
"Riak.mapValuesJson".

As you can see below in the JSON data the value of the "street"
property is: "Albyv\\u00e4gen 6" where "u00e4" is
http://www.fileformat.info/info/unicode/char/00e4/index.htm

So there's a Unicode escape sequence there.

So my humble question is, might there be a problem with the M/R and
"high ASCII characters"?



best, Mårten.


=ERROR REPORT==== 17-Feb-2010::15:37:17 ===
** Generic server <0.16457.0> terminating
** Last message in was {'$gen_cast',
                        {dispatch,<0.199.0>,
                         {<0.16457.0>,#Ref<0.0.0.49966>},
                         {<7082.27342.0>,
                          {map,{jsfun,<<"Riak.mapValuesJson">>},none,false},
                          {r_object,<<"letterboxes">>,<<"86288">>,
                           [{r_content,
                             {dict,5,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],
                                [[<<"Links">>]],
                                [],[],[],[],[],[],[],
                                [[<<"content-type">>,97,112,112,108,105,99,97,
                                  116,105,111,110,47,106,115,111,110],
                                 [<<"X-Riak-VTag">>,121,114,110,51,106,84,51,
                                  52,88,79,113,109,53,49,74,79,105,86,112,103,
                                  119]],
                                [],[],
                                [[<<"X-Riak-Last-Modified">>|
                                  {1266,416075,572221}]],
                                [],
                                [[<<"X-Riak-Meta">>]]}}},

<<"{\"id\":\"86288\",\"key\":\"86288\",\"value\":{\"rev\":\"1-f37125b006cad12ad53f211d868ede54\"},\"doc\":{\"_id\":\"86288\",\"_rev\":\"1-f37125b006cad12ad53f211d868ede54\",\"family\":\"letterboxes\",\"address\":{\"street\":\"Albyv\\u00e4gen
6\",\"streetInfo\":\"Raymons
Spel\",\"zipcode\":14559,\"city\":\"Norsborg\"},\"east\":1616410.07,\"north\":6570347.22,\"boxes\":[{\"id\":109131,\"active\":{\"startDate\":\"20091019\",\"endDate\":\"\"},\"features\":{\"driveIn\":true,\"handicap\":true,\"lastMinute\":true,\"season\":true},\"emptied\":{\"weekday\":\"0\",\"weekend\":\"1800\"},\"localTime\":\"0\",\"regionalZipCode\":\"\",\"localZipCode\":\"\",\"exemptionText\":\"\",\"weekdays\":{\"monday\":false,\"tuesday\":false,\"wednesday\":true,\"thursday\":true,\"friday\":true}}]}}">>}],
                           [{<<4,133,159,90>>,{1,63433635275}}],
                           {dict,1,16,16,8,80,48,
                            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                            {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                              [[clean|true]],
                              []}}},
                           undefined},
                          undefined,
                          {<<"letterboxes">>,<<"86288">>}}}}
** When Server state == {state,<0.100.0>,#Port<0.10797>,undefined,undefined}
** Reason for termination ==
** {bad_return_value,invalid_utf8}



More information about the riak-users mailing list