Solr requests and Riak Search (was: Re: LucidWorks Banana Integration)
dkrotkine at gmail.com
Mon Dec 7 06:56:52 EST 2015
I feel like the topic changed a bit, so I changed the email subject.
Mark Schmidt wrote:
> Hi Dams, I appreciate the assistance.
> I was able to turn up Banana fairly easily using the steps you laid
> out below, but I have a few questions regarding communicating with my
> Riak environment.
> We have a 6 Riak nodes sitting behind a proxy that handles both HTTP
> and protocol buffer load balancing. I’ve turned up a new standalone
> Riak node (hosts Banana/Nginx) outside of the cluster.
First of all, you're using a proxy. It's common practice, and should not
impact your solr requests, as long as the proxy properly forward the
request (riak KV, search or solr, on any port) to one of the node. I
personally am not a big fan of using a proxy in front of Riak nodes,
although it's not a technical issue here. Most of the time people are
using proxies in front of clusters (like Riak) to have the proxy answer
the question "what are the nodes that are up and running ?" instead of
answering this question on the hosts initiating the request. Although it
looks like a great idea and seems to simplify things, it brings issues,
like network traffic concentration, or potential SPOF. To work around
that, the proxy has to be made redundant, and not configured as
pass-through, etc. A better approach is - imho - to use something like
Zookeeper or any lighter equivalent solution to have real-time knowledge
of which nodes are up and running, and share that knowledge with all the
hosts that are going to use Riak. They can then do round-robin or random
selection of Riak nodes among the ones that are up and running. Anyway,
that's a different topic than the present email.
> 1)Should the Nginx config file be pointed at my HAProxy IP that
> handles the Riak node load balancing, or do I need to incorporate
> additional settings in the config to handle the 6 Riak Solr nodes?
The nginx configuration should point to your HAProxy IP. Distributed
solr requests will be forwarded to the other nodes properly, as long as
they are allowed to (firewal rules) on 8098 and 8093.
> 2)Should I use the Riak HTTP interface port (8098) or the Solr
> interface port (8093) in the Nginx config file?
You should use 8098 ports for all queries
> 3)Is there any way to perform faceted queries or other more advanced
> query functionality against the Riak Solr nodes? Searching through the
> conversation archives, it sounds like we may be able to query the Solr
> nodes themselves outside of the Riak API.
In a nutshell: yes! *all* Solr API is available through riak search,
because the Riak API is just forwarding to solr. I recommend to *not*
use any special Riak client to query Riak Search, but instead use plain
Http, using any http client that your language provide. I recommend
reading this page again
http://docs.basho.com/riak/latest/dev/using/search/ but clicking on HTTP :)
This is the example given. In my company, I've been using facets
queries, stats, etc... The only limitation is the Solr version that is
bundled with Riak Search (hopefully it'll be upgraded in a later release ).
So back to nginx rules:
The first rule is to allow banana to query riak search: it thinks that
it's talking to a regular Solr, so you have to have a nginx rule to fix
that. I think you've got that part right. In my setup I added an
additional rule to allow using the solr admin web interface, whic give
some interesting figures and options, and it's useful for debugging. So
I added a rule to say that if it starts with "internal_solr", in this
case instead of forwarding to 8098 it continues on 8093. But that's the
only rule I added from the configuration I pointed to you.
Here is an example of a request that I do using Riak Search :
I do a query on a node, on port 80:
the nginx rule ransforms that into :
Disclaimer: I manually edited the request so it's probably not 100%
valid, but at least you get the idea of what we can do : I'm using the
solr stats features *with* facets at the same time. In this case I4m
only interested by the stats (min/max/sum_of_squares/average) and not
the actual results, so I set row=0.
So basically all the solr power is there :)
Hope that helps and sorry for the somewhat late answer,
> Thanks again Dams,
> -Mark Schmidt
> *From:*Damien Krotkine [mailto:damien at krotkine.com]
> *Sent:* Saturday, November 28, 2015 5:50 AM
> *To:* Mark Schmidt <mschmidt at orcawave.net>
> *Cc:* 'riak-users' <riak-users at lists.basho.com>
> *Subject:* Re: LucidWorks Banana Integration
> Hi Mark,
> I have successfully integrated Banana with Riak 2.0 Solr
> implementation. I simply configured a nginx to act as proxy between
> Riak Search / Solr / What banana expects. So basically:
> - Install Riak 2, java, and enable Riak Search (follow basho doc)
> - Install banana
> - install nginx and use this as a base :
> - configure banana to point to the solr on your riak search.
> If you need more help, feel free to ask,
> Mark Schmidt wrote:
> Has anyone successfully integrated Banana with the Riak 2.0 Solr
> -Mark Schmidt
> riak-users mailing list
> riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users