appearance of text different in riak different than original xml data

Wes James comptekki at gmail.com
Sat Apr 7 18:56:15 EDT 2012


Ok - that's good to know.

Thanks,

Wes

On Sat, Apr 7, 2012 at 9:32 AM, Sean Cribbs <sean at basho.com> wrote:
> Wes,
>
> Also, if you're using curl to load things into Riak, be sure to use
> --data-binary with your payload, which will not try to convert multibyte
> characters or line-terminators.
>
> On Sat, Apr 7, 2012 at 11:21 AM, Wes James <comptekki at gmail.com> wrote:
>>
>> I found it. I thought if any web site might be able to handle unicode,
>> it would be erlang.org, so I went and grabbed some of the header text:
>>
>> <?xml version='1.0' encoding='utf-8'?>
>> <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN'
>>    'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
>> <html xmlns='http://www.w3.org/1999/xhtml'>
>> <head>
>> <title>test</title>
>>  <meta http-equiv='Content-Type' content='text/html;charset=utf-8'/>
>> </head>
>>
>> and it works correctly now.
>>
>> thanks
>>
>> On Fri, Apr 6, 2012 at 3:18 PM, Kresten Krab Thorup <krab at trifork.com>
>> wrote:
>> > It looks like you may have missed specifying the charset when importing
>> > your data; could that be the case?
>> >
>> > You need to specify the charset when importing 8-bit text.  It looks
>> > like your xml is utf-8 encoded, so it should be imported using something
>> > like this:
>> >
>> > curl -H 'Content-Type: text/html;charset=UTF-8' -X PUT @datafile.xml
>> > http://host:port/riak/bucket/key
>> >
>> > The various language clients have different ways of specifying the
>> > charset for a value; so if you imported the xml using some other method you
>> > need to find out where to specify it.
>> >
>> > Perhaps to verify, you can check the result of a curl -v (verbose, print
>> > the headers) for one of your values.  If it does not come back with a
>> > charset=XXX in the Content-Type header, then this is your problem.
>> >
>> > Kresten
>> >
>> >
>> >
>> > On Apr 6, 2012, at 4:44 PM, Wes James wrote:
>> >
>> > I imported many records, one of which looks like this:
>> >
>> > <add>
>> > <doc>
>> > <field name='id'>0</field>
>> > <field name='title'>Ekologie lučních porostů (A)</field>
>> > <field name='author_editor'>Rychnovská, Milena, Emilie
>> > Balátová-Tuláčková, Blanka Úlehlová, Jaroslav Pelikán</field>
>> > <field name='date_of_publication'>1985</field>
>> > <field name='publisher'>Academia</field>
>> > <field name='keywords'>-</field>
>> > <field name='notes'>amazon 5/22/09 Category: Ecology (Y)</field>
>> > <field name='valuation'>8.00</field>
>> > <field name='purchase_price'>10.00</field>
>> > </doc>
>> > </add>
>> >
>> > with
>> >
>> > bin/search-cmd solr books books.xml
>> >
>> > Notice the characters above.  In the riak -> cowboy -> webpage it looks
>> > like:
>> >
>> > Id:     0
>> > Title:  title: Ekologie luÄ ních porostů (A)
>> > Auther Editor:  author_editor: Rychnovská, Milena, Emilie
>> > Balátová-TulÃ¡Ä ková, Blanka Úlehlová, Jaroslav Pelikán
>> > Date of Publication:    date_of_publication: 1985
>> > Notes:  publisher: Academia
>> > Notes:  notes: amazon 5/22/09 Category: Ecology (Y)
>> > Purchase Price: purchase_price: 10.00
>> > Valuation:      valuation: 8.00
>> >
>> > Is there a way I can fix this?
>> >
>> > Doing an io:format it it looks like:
>> >
>> > Rychnovská, Milena, Emilie Balátová-TulÃ¡Ä ková, Blanka Úlehlová,
>> > Jaroslav Pelikán
>> >
>> > Thanks,
>> >
>> > Wes
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>> >
>> >
>> > Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
>> > Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45
>> > 8732 8787  |  www.trifork.com<http://www.trifork.com>
>> >
>> >
>> >
>> >
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
> --
> Sean Cribbs <sean at basho.com>
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
>




More information about the riak-users mailing list