appearance of text different in riak different than original xml data

Kresten Krab Thorup krab at trifork.com
Fri Apr 6 17:18:21 EDT 2012


It looks like you may have missed specifying the charset when importing your data; could that be the case?

You need to specify the charset when importing 8-bit text.  It looks like your xml is utf-8 encoded, so it should be imported using something like this:

curl -H 'Content-Type: text/html;charset=UTF-8' -X PUT @datafile.xml http://host:port/riak/bucket/key

The various language clients have different ways of specifying the charset for a value; so if you imported the xml using some other method you need to find out where to specify it.

Perhaps to verify, you can check the result of a curl -v (verbose, print the headers) for one of your values.  If it does not come back with a charset=XXX in the Content-Type header, then this is your problem.

Kresten



On Apr 6, 2012, at 4:44 PM, Wes James wrote:

I imported many records, one of which looks like this:

<add>
<doc>
<field name='id'>0</field>
<field name='title'>Ekologie lučních porostů (A)</field>
<field name='author_editor'>Rychnovská, Milena, Emilie Balátová-Tuláčková, Blanka Úlehlová, Jaroslav Pelikán</field>
<field name='date_of_publication'>1985</field>
<field name='publisher'>Academia</field>
<field name='keywords'>-</field>
<field name='notes'>amazon 5/22/09 Category: Ecology (Y)</field>
<field name='valuation'>8.00</field>
<field name='purchase_price'>10.00</field>
</doc>
</add>

with

bin/search-cmd solr books books.xml

Notice the characters above.  In the riak -> cowboy -> webpage it looks like:

Id:     0
Title:  title: Ekologie lučních porostů (A)
Auther Editor:  author_editor: Rychnovská, Milena, Emilie Balátová-Tuláčková, Blanka Úlehlová, Jaroslav Pelikán
Date of Publication:    date_of_publication: 1985
Notes:  publisher: Academia
Notes:  notes: amazon 5/22/09 Category: Ecology (Y)
Purchase Price: purchase_price: 10.00
Valuation:      valuation: 8.00

Is there a way I can fix this?

Doing an io:format it it looks like:

Rychnovská, Milena, Emilie Balátová-Tuláčková, Blanka Úlehlová, Jaroslav Pelikán

Thanks,

Wes
_______________________________________________
riak-users mailing list
riak-users at lists.basho.com<mailto:riak-users at lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 8787  |  www.trifork.com<http://www.trifork.com>






More information about the riak-users mailing list