appearance of text different in riak different than original xml data

Wes James comptekki at gmail.com
Sat Apr 7 11:21:38 EDT 2012


I found it. I thought if any web site might be able to handle unicode,
it would be erlang.org, so I went and grabbed some of the header text:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN'
    'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
<html xmlns='http://www.w3.org/1999/xhtml'>
<head>
<title>test</title>
  <meta http-equiv='Content-Type' content='text/html;charset=utf-8'/>
</head>

and it works correctly now.

thanks

On Fri, Apr 6, 2012 at 3:18 PM, Kresten Krab Thorup <krab at trifork.com> wrote:
> It looks like you may have missed specifying the charset when importing your data; could that be the case?
>
> You need to specify the charset when importing 8-bit text.  It looks like your xml is utf-8 encoded, so it should be imported using something like this:
>
> curl -H 'Content-Type: text/html;charset=UTF-8' -X PUT @datafile.xml http://host:port/riak/bucket/key
>
> The various language clients have different ways of specifying the charset for a value; so if you imported the xml using some other method you need to find out where to specify it.
>
> Perhaps to verify, you can check the result of a curl -v (verbose, print the headers) for one of your values.  If it does not come back with a charset=XXX in the Content-Type header, then this is your problem.
>
> Kresten
>
>
>
> On Apr 6, 2012, at 4:44 PM, Wes James wrote:
>
> I imported many records, one of which looks like this:
>
> <add>
> <doc>
> <field name='id'>0</field>
> <field name='title'>Ekologie lučních porostů (A)</field>
> <field name='author_editor'>Rychnovská, Milena, Emilie Balátová-Tuláčková, Blanka Úlehlová, Jaroslav Pelikán</field>
> <field name='date_of_publication'>1985</field>
> <field name='publisher'>Academia</field>
> <field name='keywords'>-</field>
> <field name='notes'>amazon 5/22/09 Category: Ecology (Y)</field>
> <field name='valuation'>8.00</field>
> <field name='purchase_price'>10.00</field>
> </doc>
> </add>
>
> with
>
> bin/search-cmd solr books books.xml
>
> Notice the characters above.  In the riak -> cowboy -> webpage it looks like:
>
> Id:     0
> Title:  title: Ekologie luÄ ních porostů (A)
> Auther Editor:  author_editor: Rychnovská, Milena, Emilie Balátová-TulÃ¡Ä ková, Blanka Úlehlová, Jaroslav Pelikán
> Date of Publication:    date_of_publication: 1985
> Notes:  publisher: Academia
> Notes:  notes: amazon 5/22/09 Category: Ecology (Y)
> Purchase Price: purchase_price: 10.00
> Valuation:      valuation: 8.00
>
> Is there a way I can fix this?
>
> Doing an io:format it it looks like:
>
> Rychnovská, Milena, Emilie Balátová-TulÃ¡Ä ková, Blanka Úlehlová, Jaroslav Pelikán
>
> Thanks,
>
> Wes
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
> Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 8787  |  www.trifork.com<http://www.trifork.com>
>
>
>
>




More information about the riak-users mailing list