riaksearch memory growth issues

Gilbert Glåns gilbert.glans at gmail.com
Tue May 31 19:09:25 EDT 2011


Gordon,

Great news!  Much appreciated.

Gilbert

On Tue, May 31, 2011 at 2:25 PM, Gordon Tillman <gtillman at mezeo.com> wrote:
> Howdy Gilbert,
>
> Hey we are testing a fix now.  If this works I will send you a copy of the update file.
>
> --gordon
>
>
> On May 31, 2011, at 12:55 , Gilbert Glåns wrote:
>
>> Hi Gordon,
>> Thank you for sharing the information.  We are seeing the same exact
>> type of behavior from our search cluster.  I have tracked the
>> problem(s) though the query system.  It looks like the mailboxes we
>> are both seeing are "abandoned" and / or the messages are never
>> matched within the Erlang code (it_op_collector_loop,
>> riak_search_op_utils.erl); the messages are then never processed,
>> therefore the resources they utilize never released.  This is a major
>> problem.
>>
>> I have been debugging this for some time and I wish I could say it was
>> going well.  The implementation is convoluted -- have you gotten
>> through it?  Can you verify the same cause?
>>
>> We have been internally discussing the possibility of removing this
>> query processing implementation completely and replacing it with
>> something built in-house because the problems we have uncovered trying
>> to debug the "abandoned mailbox" problem are related and systemic:  1)
>> indeterminate and possibly very large data structures created and
>> manipulated for intermediate and final sets of results, 2) very poor
>> or non-existent ability to gain any insight into what is executing
>> within the "plumbing" of the current query execution system without
>> "herculean" effort (in my opinion), and 3) unacceptable performance
>> (predictably or subjectively) from the merge_index riak_search
>> backend.
>>
>> Are there any other backends available for riak_search with the
>> Enterprise Riak offering?  I really like the design of riak_search but
>> the performance seems to be only a very small fraction of our
>> equivalent SOLR installation, even with several times the amount of
>> resources "thrown at it" -- it does not seem to use resources we
>> "throw at it" well, either, or in the mailboxes case, responsibly.
>>
>> I will quickly admit I may be doing something wrong.  Is there a
>> user-error situation in which mailboxes should be abandoned taking up
>> memory?
>>
>> Does anyone else have experiences with equivalent riak_search vs. SOLR
>> installations?
>>
>> Thanks again for sharing Gordon.  Your results make me feel like this
>> may not be entirely stupidity on my part.
>>
>> Gilbert
>>
>>
>> On Tue, May 31, 2011 at 8:51 AM, Gordon Tillman <gtillman at mezeo.com> wrote:
>>> Howdy Gilbert,
>>> I reproduced the issue this morning and then ran the command that you
>>> specified on two of the non-empty mailboxes.
>>> The output from that is posted here:
>>> https://gist.github.com/1000735
>>> Please let me know if this corresponds to the issue that you are seeing.
>>> Thank you,
>>> --gordon
>>>
>>> On May 27, 2011, at 20:10 , Gilbert Glåns wrote:
>>>
>>> Gordon,
>>> Could you try:
>>>
>>> erlang:process_info(list_to_pid("<0.16614.32>"), [messages,
>>> current_function, initial_call, links, memory, status]).
>>>
>>> in a riak search console for one/some of those mailboxes and share the
>>> results? I am curious to see if you are having the same systemic
>>> memory consumption I am experiencing.
>>>
>>> Gilbert
>>>
>>> On Fri, May 27, 2011 at 5:15 PM, Gordon Tillman <gtillman at mezeo.com> wrote:
>>>
>>> Howdy Gang,
>>>
>>> We are having a bit of an issue with our 3-node riaksearch cluster.  What is
>>> happing is this:
>>>
>>> Cluster is up and running.  We start testing our application against it.  As
>>> the application runs the erlang process consumes more and more memory
>>> without ever releasing it.
>>>
>>> In trying to investigate the issue we ran the riaksearch-admin cluster_info
>>> command.  It appears that the bulk of this memory is being consumed by a
>>> bunch of mailboxes.
>>>
>>> I have posted both the output of the cluster_info command and the app.config
>>> from one of the nodes here:
>>>
>>> https://gist.github.com/996419
>>>
>>> I would be very grateful if someone from Basho would take a look at the
>>> cluster_info and see if they can spot anything obvious.
>>>
>>> Each machine in the cluster has an 8-core Xeon and 16GB RAM.  I believe all
>>> of the platform details, etc., are in the cluster_info dump.
>>>
>>> Many thanks,
>>>
>>> --gordon
>>>
>>> _______________________________________________
>>>
>>> riak-users mailing list
>>>
>>> riak-users at lists.basho.com
>>>
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>
>




More information about the riak-users mailing list