ApacheSolr search for Drupal on CentOS

At the recently Drupal Camp Prague 09, I was introduced to ApacheSolr as a replacement to the standard Drupal search or the Google CSE.

On most sites the basic Google CSE setup is sufficient, however for some of the more serious "work" websites my colleague Nick and I got experimenting with the Drupal's implementation of ApacheSolr module.

Here is a quick and rough writeup on how it was implemented on work and my personal (janaksingh.com website).

Please feel free to share your experiences and tips.

Basic Installation

1) Install Java on CentOS if you havent already

2) Install ApacheSolr drupal module

cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib checkout -d apachesolr -r DRUPAL-6--2 contributions/modules/apachesolr/

3) Checkout the ApacheSolr PHP Lib inside the ApacheSolr module folder

svn checkout -r22 http://solr-php-client.googlecode.com/svn/trunk/ SolrPhpClient

4) Create a temp folder in your home directory

cd ~
mkdir temp
cd temp

5) Download ApacheSolr in the temp folder
SVN Method

TAR Method
Grab the tarball from http://www.apache.org/dyn/closer.cgi/lucene/solr/

wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/lucene/solr/1.4.0/apach... -xzf apache-solr-1.4.0.zip
mv apache-solr-1.4.0 apache-solr

I have moved my "apache-solr" folder to "/usr/local/share/" but I am not sure if this matters

6) Rename 2 default files in apache-solr/example/solr/conf/

sudo mv solrconfig.xml solrconfig.bak
sudo mv schema.xml schema.bak

7) Copy the schema.xml and solrconfig.xml from the drupal apachesolr module folder into apache-solr/example/solr/conf/

Multisite / Multicore Setup

8) If you want multisite seach (multicore), here is what I did:

  • duplicate the example folder to "sites"
  • create a folder for each new website
  • copy the conf folder from sites folder into each one of the website folders
  • create a solr.xml file in the "sites" folder and defined a core for each of the sites.. here is what my sole.xml file looks like:
    <solr persistent="false">
      <!--
      adminPath: RequestHandler path to manage cores.
        If 'null' (or absent), cores will not be manageable via request handler
      -->
      <cores adminPath="/admin/cores">
        <core name="website1_core" instanceDir="website1.com" />
      </cores>
    </solr>

9) Finally, in the Drupal ApacheSolr module, update the "Solr path" to the same core name you defined for the site in the sole.xml file. If you are using the example setup as outlined in the readme file that ships with the ApacheSolr Drupal module, you need not change anything as the default path the module ships with is correct. You only need to change this if you define your own cores.

Testing

If you are using the example site, test your installation as follows:

cd apache-solr/example
sudo java -jar start.jar

If you are using the multicore setup from step 8, test as follows

cd apache-solr/sites
sudo java -jar start.jar

Check the ApacheSolr admin page by visiting http://mydomain:8983/solr/admin/ or if you are using multicore http://mydomain:8983/solr/mycore_name/admin/

Auto start ApacheSolr on server reboot

You can follow the instructions on Want to start Apache solr automatically after a reboot? but depending on how you defined the cores above you will need to alter the path eg:

SOLR_DIR="/opt/apache-solr/example"

or
SOLR_DIR="/opt/apache-solr/sites"

Dont forget to change permissions on the sole bash file you created (etc/init.d/solr) and remember to install the script using chkconfig (all outlined in the guide above)

Security

By default, ApacheSolr does not ship with any kind of port protection, you are advised to secure your server ports by using iptables or a dedicated firewall (if you have one). For CentOS iptable guidelines click here

Further Reading


Bookmark and Share

4 comments

Anonymous's picture

Nice! I've not had enough coffee this morning, to be sure, but did you skip the part where you start the server?

Also, in step 9, assuming you're using the packaged Jetty as per your instructions, the core to be used will be /solr, which is what the default values in the module are, so you won't need to modify anything.

Now, beware! There is no authentication at the Solr level, so you have to make sure that you don't open the 8983 (or 8080) port to the outside world. I use Uncomplicated Fire Wall http://ubuntuforums.org/showthread.php?t=823741 to close these ports to the outside world as a final step in my setup.

Janak's picture

"Solr Gods" grace this humble website ;) I am no expert at this, so your input is much valued :)

Good points above Rob, yes I did skip that part from the tutorial as I assumed the user would be wanting to run the site using the proper Core setup I mentioned in step 8.

The script at the auto restart link adds a link to the example folder which the user can modify for their needs. But i have rewritten the tutorial to make it a little more clearer.

If the user defined a core as outlined in the setup above, then yes they should update the path to "solr/mycore_name" they defined in the solr.xml file. if not, it will use the defaults.

I included a link to the iptables setup on CentOS. I think the readme file that ships with the module should mention the security warning too!

Anonymous's picture

Hi Janak,
Thanks for your great article. I followed along but somehow things went awry. When I run
sudo java -jar start.jar
from within the example directory the app seems to hang. I get only this.

2010-01-21 22:46:55.661::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2010-01-21 22:46:55.808::INFO:  jetty-6.1.3
2010-01-21 22:46:55.857::INFO:  Started SocketConnector @ 0.0.0.0:8983

Error logs are blank aside from 404s which are returned whenever I try to hit http://mydomain:8983/solr/admin/.

I'm on a mediatemple dv 3.5 with centOS 5.2 (final)

Thanks,
Tim

Janak's picture

Hi Tim,
It seems Jetty starts up fine for you! I had a similar issue where i could not connect to it using the admin interface! Have you checked the IPTables and SeLinux to make sure you have access to that port? I think by default (depending on setting) it will block the port! Be careful though as SeLinux can be a little tricky to configure.

Also make sure allow access to port 8983 from localhost only! Check my Further Reading section above for helpful links!

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options