At the recently Drupal Camp Prague 09, I was introduced to ApacheSolr as a replacement to the standard Drupal search or the Google CSE.
On most sites the basic Google CSE setup is sufficient, however for some of the more serious "work" websites my colleague Nick and I got experimenting with the Drupal's implementation of ApacheSolr module.
Here is a quick and rough writeup on how it was implemented on work and my personal (janaksingh.com website).
Please feel free to share your experiences and tips.
1) Install Java on CentOS if you havent already
2) Install ApacheSolr drupal module
cvs -z6 -d:pserver:anonymous:firstname.lastname@example.org:/cvs/drupal-contrib checkout -d apachesolr -r DRUPAL-6--2 contributions/modules/apachesolr/
3) Checkout the ApacheSolr PHP Lib inside the ApacheSolr module folder
svn checkout -r22 http://solr-php-client.googlecode.com/svn/trunk/ SolrPhpClient
4) Create a temp folder in your home directory
cd ~ mkdir temp cd temp
5) Download ApacheSolr in the temp folder
svn checkout http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4 apache-solr
Grab the tarball from http://www.apache.org/dyn/closer.cgi/lucene/solr/
tar -xzf apache-solr-1.4.0.zip mv apache-solr-1.4.0 apache-solr
I have moved my "apache-solr" folder to "/usr/local/share/" but I am not sure if this matters
6) Rename 2 default files in apache-solr/example/solr/conf/
sudo mv solrconfig.xml solrconfig.bak sudo mv schema.xml schema.bak
7) Copy the schema.xml and solrconfig.xml from the drupal apachesolr module folder into apache-solr/example/solr/conf/
Multisite / Multicore Setup
8) If you want multisite seach (multicore), here is what I did:
- duplicate the example folder to "sites"
- create a folder for each new website
- copy the conf folder from sites folder into each one of the website folders
- create a solr.xml file in the "sites" folder and defined a core for each of the sites.. here is what my sole.xml file looks like:
<solr persistent="false"> <!-- adminPath: RequestHandler path to manage cores. If 'null' (or absent), cores will not be manageable via request handler --> <cores adminPath="/admin/cores"> <core name="website1_core" instanceDir="website1.com" /> </cores> </solr>
9) Finally, in the Drupal ApacheSolr module, update the "Solr path" to the same core name you defined for the site in the sole.xml file. If you are using the example setup as outlined in the readme file that ships with the ApacheSolr Drupal module, you need not change anything as the default path the module ships with is correct. You only need to change this if you define your own cores.
If you are using the example site, test your installation as follows:
cd apache-solr/example sudo java -jar start.jar
If you are using the multicore setup from step 8, test as follows
cd apache-solr/sites sudo java -jar start.jar
Auto start ApacheSolr on server reboot
You can follow the instructions on Want to start Apache solr automatically after a reboot? but depending on how you defined the cores above you will need to alter the path eg:
Dont forget to change permissions on the sole bash file you created (etc/init.d/solr) and remember to install the script using chkconfig (all outlined in the guide above)
By default, ApacheSolr does not ship with any kind of port protection, you are advised to secure your server ports by using iptables or a dedicated firewall (if you have one). For CentOS iptable guidelines click here