Map/Reduce, UTF-8 and Swedish high ASCII characters

Kevin Smith ksmith at basho.com
Wed Feb 17 15:36:18 EST 2010


I've been able to reproduce the error on my own. I'm investigating it now and will update the list when I have more info.

--Kevin
On Feb 17, 2010, at 12:36 PM, Kevin Smith wrote:

> Marten - 
> 
> Is there anyway I could get a small set of test data to use for debugging purposes? I have to step out for a bit but I'd like to dig into this problem soon.
> 
> --Kevin
> On Feb 17, 2010, at 12:28 PM, Mårten Gustafson wrote:
> 
>> Howdy chaps!
>> 
>> I've been struggling back and forth with trying to run map/reduce over
>> one of the our datasets and I've stumbled on the error message at the
>> bottom of this mail. The message itself is pretty clear I think
>> "{bad_return_value,invalid_utf8}". The dataset is in Swedish and hence
>> we have a couple of high ASCII characters present, namely:
>> 
>> 134: å
>> 143: Å
>> 132: ä
>> 142: Ä
>> 148: ö
>> 153: Ö
>> 
>> The data stems from CouchDB which, when surfed to, is nicely displayed
>> correctly (Firefox detects UTF-8 and renders accordingly). I've then
>> ran it through a node.js script that extracts the data from CouchDB
>> and stores it in Riak.
>> Pointing Firefox to the Riak URL for a given entry renders it
>> correctly (again, UTF-8 detection). However when running my map/reduce
>> job it bails out in the first phase which is applying the built in
>> "Riak.mapValuesJson".
>> 
>> As you can see below in the JSON data the value of the "street"
>> property is: "Albyv\\u00e4gen 6" where "u00e4" is
>> http://www.fileformat.info/info/unicode/char/00e4/index.htm
>> 
>> So there's a Unicode escape sequence there.
>> 
>> So my humble question is, might there be a problem with the M/R and
>> "high ASCII characters"?
>> 
>> 
>> 
>> best, Mårten.
>> 
>> 
>> =ERROR REPORT==== 17-Feb-2010::15:37:17 ===
>> ** Generic server <0.16457.0> terminating
>> ** Last message in was {'$gen_cast',
>>                       {dispatch,<0.199.0>,
>>                        {<0.16457.0>,#Ref<0.0.0.49966>},
>>                        {<7082.27342.0>,
>>                         {map,{jsfun,<<"Riak.mapValuesJson">>},none,false},
>>                         {r_object,<<"letterboxes">>,<<"86288">>,
>>                          [{r_content,
>>                            {dict,5,16,16,8,80,48,
>>                             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                              []},
>>                             {{[],[],
>>                               [[<<"Links">>]],
>>                               [],[],[],[],[],[],[],
>>                               [[<<"content-type">>,97,112,112,108,105,99,97,
>>                                 116,105,111,110,47,106,115,111,110],
>>                                [<<"X-Riak-VTag">>,121,114,110,51,106,84,51,
>>                                 52,88,79,113,109,53,49,74,79,105,86,112,103,
>>                                 119]],
>>                               [],[],
>>                               [[<<"X-Riak-Last-Modified">>|
>>                                 {1266,416075,572221}]],
>>                               [],
>>                               [[<<"X-Riak-Meta">>]]}}},
>> 
>> <<"{\"id\":\"86288\",\"key\":\"86288\",\"value\":{\"rev\":\"1-f37125b006cad12ad53f211d868ede54\"},\"doc\":{\"_id\":\"86288\",\"_rev\":\"1-f37125b006cad12ad53f211d868ede54\",\"family\":\"letterboxes\",\"address\":{\"street\":\"Albyv\\u00e4gen
>> 6\",\"streetInfo\":\"Raymons
>> Spel\",\"zipcode\":14559,\"city\":\"Norsborg\"},\"east\":1616410.07,\"north\":6570347.22,\"boxes\":[{\"id\":109131,\"active\":{\"startDate\":\"20091019\",\"endDate\":\"\"},\"features\":{\"driveIn\":true,\"handicap\":true,\"lastMinute\":true,\"season\":true},\"emptied\":{\"weekday\":\"0\",\"weekend\":\"1800\"},\"localTime\":\"0\",\"regionalZipCode\":\"\",\"localZipCode\":\"\",\"exemptionText\":\"\",\"weekdays\":{\"monday\":false,\"tuesday\":false,\"wednesday\":true,\"thursday\":true,\"friday\":true}}]}}">>}],
>>                          [{<<4,133,159,90>>,{1,63433635275}}],
>>                          {dict,1,16,16,8,80,48,
>>                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>                           {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>>                             [[clean|true]],
>>                             []}}},
>>                          undefined},
>>                         undefined,
>>                         {<<"letterboxes">>,<<"86288">>}}}}
>> ** When Server state == {state,<0.100.0>,#Port<0.10797>,undefined,undefined}
>> ** Reason for termination ==
>> ** {bad_return_value,invalid_utf8}
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 





More information about the riak-users mailing list