Timeouts in Riak Search

Spike Gronim spike at wavii.com
Wed Nov 9 14:46:18 EST 2011


My Riak search cluster is timing out very often. I am indexing text content extracted from web pages containing news articles. My articles range in size from a few KB to tens of KB.  I have put about 4.4 million articles into Riak for an average article size of 15 KB. The keys are MD5 ASCII hex hashes and the values are JSON. When I set this system up I loaded it with 1GB or so of data and played with the search system. Everything was kosher, it responded quickly and the search relevance was fine. Now that I've imported 100x as much data I am getting timeouts. For example the query "steve jobs died" times out. When I put in extremely specific conjunctive queries like "+steve +jobs +died +cupertino +apple" I get no results but it runs quickly. While the system is running a query that will time out I see the coordinator Riak node consuming between one and two cores worth of CPU.

How can I configure Riak to stop timing out searches? I am open to changing my schema and query pattern if that's what I need to do.

app.config - https://gist.github.com/1352608
schema - https://gist.github.com/1352616
selected errors - https://gist.github.com/1c0976ced0f05ef0d5d6

Nodes in the cluster: 4
Hardware: EC2 m1.large with two disks in a RAID-0 on /mnt
Operating system: Linux ip-XXXX 2.6.38-11-virtual #50-Ubuntu SMP Mon Sep 12 21:51:23 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Disk space consumed:

66G  /mnt/riak/leveldb
36G  /mnt/riak/merge_index

Disk space available: 800G

Spike Gronim
spike at wavii.com<mailto:spike at wavii.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111109/d84634c4/attachment.html>

More information about the riak-users mailing list