Some questions about Riak Search and Riak itself

Bryan Fink bryan at basho.com
Wed Oct 13 21:17:57 EDT 2010


2010/10/13 Dmitry Demeshchuk <demeshchuk at gmail.com>:
> I worked it out. Both shell and command-line search work good. Seems
> like I've been doing something wrong before.

Excellent - good to hear.

>>> 4. Do you have any tips and advice about working with Unicode in Riak Search?
>>
>> Encode everything in UTF-8.  There may still be a few bugs we need to
>> work out, but our intended goal is to have everything in that
>> department "just work" once you're using UTF-8 everywhere.
>
> I'm not sure if I do everything right but here's the step-by step
> description of my actions:
>
> 1.  curl -v -d "{\"title\":\"Статья 1\", \"tags\":\"псто, лытдыбр\",
> \"body\":\"Я что-то здесь написал\"}" -H "Content-Type:
> application/json" http://127.0.0.1:8098/riak/posts
>
> (Note, there are cyrillic symbols)
>
...snip...
>
> 3. curl -X POST -H "content-type: application/json"
> http://localhost:8098/mapred -d '{"inputs":{"module":"riak_search",
> "function":"mapred_search", "arg": ["posts", "title:Статья*"]},
> "query":[{"map":{"language":"javascript","source":"Riak.mapValues",
> "keep":true}}]}'
>
...snip...
>
> Am I doing something wrong? Should I encode characters somehow before
> I send them into RiakSearch?

Thanks for the excellent test case - very easy to reproduce.  I
apologize for my delay in responding, but I wanted to make sure I had
all of my ducks in a row first.

So, no, you're doing nothing wrong.  The default, Erlang-based
analyzer is, in fact, just ignoring non-ascii characters.  I've
created an issue to track the fix to that analyzer here:

   https://issues.basho.com/show_bug.cgi?id=814

In the meantime, the easiest way to fix this issue is to use the
Java-based "DefaultAnalyzerFactory", which handles non-ascii
characters correctly (in my tests, at least; I look forward to yours).
 To use this analyzer, edit your schema file, and add the following
line to the first list in the schema:

   {analyzer_factory, "com.basho.search.analysis.DefaultAnalyzerFactory"}

(The example schemas on the wiki and in doc/using_search.org
demonstrate the proper placement of this line).  After editing, use
the bin/search-cmd script to update the schema:

   $RIAK/bin/seach-cmd set_schema posts /path/to/your/schema.def

Riak Search should reindex any documents you have stored using the new
analyzer.  Try your map/reduce query again, and I think you'll find it
working.

-Bryan




More information about the riak-users mailing list