Map/Reduce, UTF-8 and Swedish high ASCII characters

Kevin Smith ksmith at basho.com
Wed Feb 17 12:36:01 EST 2010


Marten - 

Is there anyway I could get a small set of test data to use for debugging purposes? I have to step out for a bit but I'd like to dig into this problem soon.

--Kevin
On Feb 17, 2010, at 12:28 PM, Mårten Gustafson wrote:

> Howdy chaps!
> 
> I've been struggling back and forth with trying to run map/reduce over
> one of the our datasets and I've stumbled on the error message at the
> bottom of this mail. The message itself is pretty clear I think
> "{bad_return_value,invalid_utf8}". The dataset is in Swedish and hence
> we have a couple of high ASCII characters present, namely:
> 
> 134: å
> 143: Å
> 132: ä
> 142: Ä
> 148: ö
> 153: Ö
> 
> The data stems from CouchDB which, when surfed to, is nicely displayed
> correctly (Firefox detects UTF-8 and renders accordingly). I've then
> ran it through a node.js script that extracts the data from CouchDB
> and stores it in Riak.
> Pointing Firefox to the Riak URL for a given entry renders it
> correctly (again, UTF-8 detection). However when running my map/reduce
> job it bails out in the first phase which is applying the built in
> "Riak.mapValuesJson".
> 
> As you can see below in the JSON data the value of the "street"
> property is: "Albyv\\u00e4gen 6" where "u00e4" is
> http://www.fileformat.info/info/unicode/char/00e4/index.htm
> 
> So there's a Unicode escape sequence there.
> 
> So my humble question is, might there be a problem with the M/R and
> "high ASCII characters"?
> 
> 
> 
> best, Mårten.
> 
> 
> =ERROR REPORT==== 17-Feb-2010::15:37:17 ===
> ** Generic server <0.16457.0> terminating
> ** Last message in was {'$gen_cast',
>                        {dispatch,<0.199.0>,
>                         {<0.16457.0>,#Ref<0.0.0.49966>},
>                         {<7082.27342.0>,
>                          {map,{jsfun,<<"Riak.mapValuesJson">>},none,false},
>                          {r_object,<<"letterboxes">>,<<"86288">>,
>                           [{r_content,
>                             {dict,5,16,16,8,80,48,
>                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                               []},
>                              {{[],[],
>                                [[<<"Links">>]],
>                                [],[],[],[],[],[],[],
>                                [[<<"content-type">>,97,112,112,108,105,99,97,
>                                  116,105,111,110,47,106,115,111,110],
>                                 [<<"X-Riak-VTag">>,121,114,110,51,106,84,51,
>                                  52,88,79,113,109,53,49,74,79,105,86,112,103,
>                                  119]],
>                                [],[],
>                                [[<<"X-Riak-Last-Modified">>|
>                                  {1266,416075,572221}]],
>                                [],
>                                [[<<"X-Riak-Meta">>]]}}},
> 
> <<"{\"id\":\"86288\",\"key\":\"86288\",\"value\":{\"rev\":\"1-f37125b006cad12ad53f211d868ede54\"},\"doc\":{\"_id\":\"86288\",\"_rev\":\"1-f37125b006cad12ad53f211d868ede54\",\"family\":\"letterboxes\",\"address\":{\"street\":\"Albyv\\u00e4gen
> 6\",\"streetInfo\":\"Raymons
> Spel\",\"zipcode\":14559,\"city\":\"Norsborg\"},\"east\":1616410.07,\"north\":6570347.22,\"boxes\":[{\"id\":109131,\"active\":{\"startDate\":\"20091019\",\"endDate\":\"\"},\"features\":{\"driveIn\":true,\"handicap\":true,\"lastMinute\":true,\"season\":true},\"emptied\":{\"weekday\":\"0\",\"weekend\":\"1800\"},\"localTime\":\"0\",\"regionalZipCode\":\"\",\"localZipCode\":\"\",\"exemptionText\":\"\",\"weekdays\":{\"monday\":false,\"tuesday\":false,\"wednesday\":true,\"thursday\":true,\"friday\":true}}]}}">>}],
>                           [{<<4,133,159,90>>,{1,63433635275}}],
>                           {dict,1,16,16,8,80,48,
>                            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>                            {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                              [[clean|true]],
>                              []}}},
>                           undefined},
>                          undefined,
>                          {<<"letterboxes">>,<<"86288">>}}}}
> ** When Server state == {state,<0.100.0>,#Port<0.10797>,undefined,undefined}
> ** Reason for termination ==
> ** {bad_return_value,invalid_utf8}
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list