Well, the "stop word not allowed in query: 'and'" part of the logged error message sounds reasonably enough. But that ought to just abort the query in question, obviously not take down the entire node.

One of my Riak servers died again. I think the previously linked crash report was not what made it die. Now no erlang OS processes are running on the machine at all. This cluster is entirely idle except I am running search queries on it. The other 3 servers are up and happy. Here are some log messages:

After decoding your msg it looks like your query is simply 'AND'.  The parser barfs at this as it expects terms on both sides of the AND and this causes the the listener for that particular connection to go down.  No other connections should be affected.  Since the query is parsed on the server there's not a whole lot you can do if a malformed query is sent.  If these are generated/user input queries then perhaps you could create a black list (in your application) for common bad queries that come in.  Also, if your client supports connection pooling that would help too.


My Riak servers are misbehaving when users enter invalid queries. The gen_server erlang process for the PB transport dies but the overall OS level process is still alive. I am exclusively using PB to access Riak, so everything grinds to a halt. I plan to work around this by writing a Riak nanny that verifies that somebody is listening on the PB port every 60s, but I think that this is actually a bug in Riak. I have tried to find other references to this bug in the mailing list archive and bugzilla but haven't found any. If this is really a bug I will write up a more detailed repro in bugzilla. Thanks.

Crash report:

