Some questions about Riak Search and Riak itself

Dmitry Demeshchuk demeshchuk at gmail.com
Wed Oct 13 09:27:10 EDT 2010


Hi, Fink. Thank you for your reply. Here are some inline comments.

On Tue, Oct 12, 2010 at 9:42 PM, Bryan Fink <bryan at basho.com> wrote:
> On Tue, Oct 12, 2010 at 3:16 AM, Dmitry Demeshchuk <demeshchuk at gmail.com> wrote:
>> 1. I tried to put some Erlang terms into Riak bucket that is being
>> indexed by Riak Search. I hoped that key-value lists like this
> ...snip...
>> Is there a way to send Erlang proplists into Riak and process them
>> using Riak Search?
>
> Hi, Dmitry.  We've filed a bug for doing exactly this:
>
> https://issues.basho.com/show_bug.cgi?id=788
>
> In the meantime, you could also write your own extractor.  See the
> "Other Data Encodings" section of using_search.org:
>
> http://bitbucket.org/basho/riak_search/src/d1f10b876cae/doc/using_search.org#cl-985
>
> Or on the wiki:
>
> http://wiki.basho.com/display/RIAK/Riak+Search+-+Indexing+and+Querying+Riak+KV+Data#RiakSearch-IndexingandQueryingRiakKVData-OtherDataEncodings
>
>> 2. Is there a way to query Erlang buckets indexes using any other APIs
>> than REST API? The only way to query the bucket I found was
>>
>> /solr/some_bucket/select
>>
>> and my attempts of using Riak Search shell and Erlang API just failed.
>
> If you could posts details about the ways in which your attempts
> failed (error messages, etc.), we might be able to help you
> troubleshoot them.
>
> The other main way of querying Search indexes is using the map/reduce
> Search input.  The "Querying via HTTP/Curl" section has an example of
> how to hook this up:
>
> http://bitbucket.org/basho/riak_search/src/d1f10b876cae/doc/using_search.org#cl-783
>
> http://wiki.basho.com/display/RIAK/Riak+Search+-+Querying#RiakSearch-Querying-QueryingviaHTTP%2FCurl
>
> And it's also possible to specify the same map/reduce input using any
> of the Erlang clients (native, protocol buffer, or http).  Though
> there is a small bug with the non-streaming native Erlang client at
> the moment (https://issues.basho.com/show_bug.cgi?id=803).  For an
> example of using that syntax, have a look at the Wriaki project:
>
> http://bitbucket.org/basho/wriaki/src/d2334be214ce/apps/wriaki/src/wiki_resource.erl#cl-267

I worked it out. Both shell and command-line search work good. Seems
like I've been doing something wrong before.

>
>> 3. Is there a way to write custom analyzers in non-java languages? I
>> saw the same question and found an answer that analyzer automatically
>> tries to start JVM for its needs. The problem is that we don't have
>> good Java and JVM developers so it would be better to use some other
>> solutions (like OCaml or even C, for example). Also, I'm kinda
>> suspicious about Java analyzers performance.
>
> At the moment, the only non-Java language supported for custom
> analyzers is Erlang.  You can specify an Erlang analyzer by adding an
> "analyzer_factory" entry to your schema, of the form:
>
>   {analyzer_factory, {erlang, my_modlue, my_function}}
>
> Other formats for the analyzer_factory setting are:
>
>   {erlang, my_module, my_function, Arguments}
>   {java, FullyQualifiedClassNameAsString}
>   {java, FullyQualifiedClassNameAsString, Arguments}
>   FullyQualifiedClassNameAsString
>
> The last format is demonstrated in the "Defining a Schema" section of the docs:
>
> http://bitbucket.org/basho/riak_search/src/d1f10b876cae/doc/using_search.org#cl-193
>
> http://wiki.basho.com/display/RIAK/Riak+Search+-+Schema#RiakSearch-Schema-DefiningaSchema
>
> Unfortunately, we haven't written much documentation about what an
> analyzer is expected to do, but hopefully between the comments in
> qilr_analyzer, and the default Erlang analyzer,
> text_analyzers:default_analyzer_factory/2, you'll be able to work out
> some of what you need.
>
> http://bitbucket.org/basho/riak_search/src/d1f10b876cae/apps/qilr/src/qilr_analyzer.erl#cl-53
>
> http://bitbucket.org/basho/riak_search/src/d1f10b876cae/apps/qilr/src/text_analyzers.erl
>
>> 4. Do you have any tips and advice about working with Unicode in Riak Search?
>
> Encode everything in UTF-8.  There may still be a few bugs we need to
> work out, but our intended goal is to have everything in that
> department "just work" once you're using UTF-8 everywhere.

I'm not sure if I do everything right but here's the step-by step
description of my actions:

1.  curl -v -d "{\"title\":\"Статья 1\", \"tags\":\"псто, лытдыбр\",
\"body\":\"Я что-то здесь написал\"}" -H "Content-Type:
application/json" http://127.0.0.1:8098/riak/posts

(Note, there are cyrillic symbols)

2. curl -X POST -H "content-type: application/json"
http://localhost:8098/mapred -d '{"inputs":"posts",
"query":[{"map":{"language":"javascript","source":"Riak.mapValues",
"keep":true}}]}'

The result is:

["{\"title\":\"\u0421\u0442\u0430\u0442\u044c\u044f 1\",
\"tags\":\"\u043f\u0441\u0442\u043e,
\u043b\u044b\u0442\u0434\u044b\u0431\u0440\", \"body\":\"\u042f
\u0447\u0442\u043e-\u0442\u043e \u0437\u0434\u0435\u0441\u044c
\u043d\u0430\u043f\u0438\u0441\u0430\u043b\"}"]

So, the cyrillic strings were encoded properly by Riak itself (not
sure if it's on the mochiweb level or somewhere else).

3. curl -X POST -H "content-type: application/json"
http://localhost:8098/mapred -d '{"inputs":{"module":"riak_search",
"function":"mapred_search", "arg": ["posts", "title:Статья*"]},
"query":[{"map":{"language":"javascript","source":"Riak.mapValues",
"keep":true}}]}'

This is a map-reduce Riak Search request. It's expected to return the
previously posted document. However, it returns an empty list.

4. Tried both shell and command-line search - the same result.

5. If I try to reproduce the same using latin characters, everything
just works fine. The JSON data may be partially cyrillic - in that
case search works on the latin fields only.


Am I doing something wrong? Should I encode characters somehow before
I send them into RiakSearch?


Thanks.


>
> -Bryan
>



-- 
Best regards,
Dmitry Demeshchuk




More information about the riak-users mailing list