ApacheSolr search for Drupal on CentOS

At the recently Drupal Camp Prague 09, I was introduced to ApacheSolr as a replacement to the standard Drupal search or the Google CSE.

On most sites the basic Google CSE setup is sufficient, however for some of the more serious "work" websites my colleague Nick and I got experimenting with the Drupal's implementation of ApacheSolr module.

Here is a quick and rough writeup on how it was implemented on work and my personal (janaksingh.com website).

Please feel free to share your experiences and tips.

Basic Installation

1) Install Java on CentOS if you havent already

2) Install ApacheSolr drupal module

cvs -z6 -d:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib checkout -d apachesolr -r DRUPAL-6--2 contributions/modules/apachesolr/

3) Checkout the ApacheSolr PHP Lib inside the ApacheSolr module folder

svn checkout -r22 http://solr-php-client.googlecode.com/svn/trunk/ SolrPhpClient

4) Create a temp folder in your home directory

cd ~
mkdir temp
cd temp

5) Download ApacheSolr in the temp folder
SVN Method

svn checkout http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 apache-solr

TAR Method
Grab the tarball from http://www.apache.org/dyn/closer.cgi/lucene/solr/

wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/lucene/solr/1.4.0/apache-solr-1.4.0.zip

tar -xzf apache-solr-1.4.0.zip
mv apache-solr-1.4.0 apache-solr

I have moved my "apache-solr" folder to "/usr/local/share/" but I am not sure if this matters

6) Rename 2 default files in apache-solr/example/solr/conf/

sudo mv solrconfig.xml solrconfig.bak
sudo mv schema.xml schema.bak

7) Copy the schema.xml and solrconfig.xml from the drupal apachesolr module folder into apache-solr/example/solr/conf/

Multisite / Multicore Setup

8) If you want multisite seach (multicore), here is what I did:

  • duplicate the example folder to "sites"
  • create a folder for each new website
  • copy the conf folder from sites folder into each one of the website folders
  • create a solr.xml file in the "sites" folder and defined a core for each of the sites.. here is what my sole.xml file looks like:
    <solr persistent="false">
      <!--
      adminPath: RequestHandler path to manage cores.
        If 'null' (or absent), cores will not be manageable via request handler
      -->
      <cores adminPath="/admin/cores">
        <core name="website1_core" instanceDir="website1.com" />
      </cores>
    </solr>

9) Finally, in the Drupal ApacheSolr module, update the "Solr path" to the same core name you defined for the site in the sole.xml file. If you are using the example setup as outlined in the readme file that ships with the ApacheSolr Drupal module, you need not change anything as the default path the module ships with is correct. You only need to change this if you define your own cores.

Testing

If you are using the example site, test your installation as follows:

cd apache-solr/example
sudo java -jar start.jar

If you are using the multicore setup from step 8, test as follows

cd apache-solr/sites
sudo java -jar start.jar

Check the ApacheSolr admin page by visiting http://mydomain:8983/solr/admin/ or if you are using multicore http://mydomain:8983/solr/mycore_name/admin/

Auto start ApacheSolr on server reboot

You can follow the instructions on Want to start Apache solr automatically after a reboot? but depending on how you defined the cores above you will need to alter the path eg:

SOLR_DIR="/opt/apache-solr/example"

or
SOLR_DIR="/opt/apache-solr/sites"

Dont forget to change permissions on the sole bash file you created (etc/init.d/solr) and remember to install the script using chkconfig (all outlined in the guide above)

Security

By default, ApacheSolr does not ship with any kind of port protection, you are advised to secure your server ports by using iptables or a dedicated firewall (if you have one). For CentOS iptable guidelines click here

Further Reading


Bookmark and Share