Riak Search analyzers

Rusty Klophaus rusty at basho.com
Mon Oct 18 14:14:06 EDT 2010

Hi Dmitry,

On Wed, Oct 13, 2010 at 11:11 PM, Dmitry Demeshchuk <demeshchuk at gmail.com>wrote:

> Greetings.
> I have a couple of questions regarding the analyzers, mainly the Java ones.
> 1. Which platform is preferable for use: OpenJDK or Sun's Java? Say, I
> won't have any uses for JVM so it will be used just for analyzers.

We have not seen any appreciable difference between the platforms, either
one should be fine. Search isn't relying on the JVM to do anything overly

> 2. Could you please give a brief description of the difference between
> the analyzers?


*com.basho.search.analysis.DefaultAnalyzerFactory* uses Lucene's
StandardTokenizer, filters out words less than 3 characters, converts tokens
to lower case, and filters out the stopwords listed in Lucene's
StopAnalyzer.java (

*com.basho.search.analysis.WhitespaceAnalyzerFactory* uses Lucene's
Whitespace tokenizer.

*com.basho.search.analysis.IntegerAnalyzerFactory *parses the field as
integers and by default pads to 10 places.

*{erlang, text_analyzers, default_analyzer_factory} *parses words as having
characters 0-9, a-z, or A-Z, filters out words less than 3 characters,
converts tokens to lower case, and filters out the same list of stopwords as

Two things to note:
- You can create your own analyzers in Java or Erlang, see the source code
under apps/qilr/java_src
- Due to a regression bug, field-level analyzer settings are not used when
running a query. Whatever default analyzer you set for the schema is used
for all fields.

> 3. I guess you have already made some benchmarks regarding the
> analyzers, haven't you?

We have made some rudimentary benchmarks which shows that Erlang analyzers
are currently faster than Java-based analyzers due to the communication
overhead. We will be working on this in future iterations.

> I remember that you are going to add a special page into wiki about
> the subject. Hope this will also help you to gather up the information
> a bit.

Absolutely, we will continue to update the wiki with more information about
Search going forward.

Hope that helps!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101018/e4625ea6/attachment.html>

More information about the riak-users mailing list