Flexible Faceting and Full Text Indexes Using ElasticSearch

Flexible Faceting and Full Text Indexes Using ElasticSearch

We increasingly need efficient data storage and retrieval for the large data projects that we work on. To meet these needs we have added the ElasticSearch search engine to our toolset, in addition to Apache’s CouchDB as our NoSQL database, and have had excellent results using them together.

What is ElasticSearch?
ElasticSearch is a new front end to the Lucene search index, putting it into the same product category as the widely used Solr project. Where ElasticSearch differs from Solar is in ease of use, flexibility, and very importantly simplicity. Unlike Solr, there is nary a XML configuration file in sight!

To index a document with ElasticSearch you simply PUT a json object to the ElasticSearch web
service. To query
the generated index, you build a query JSON object using the Query DSL to the web service, and it returns the relevant documents. This query language is very expressive, unlocking most of the functionality available in Lucene.

ElasticSearch and CouchDB share many philosophies and practical similarities, and because of this they complement each other very well. CouchDB can also be used to index your data, and ElasticSearch could be used on its own as a key-value data store, but when they are used together they make up for the deficiencies in the other wonderfully.

Because of some solid design decisions on both project’s part, integrating the two systems is just about painless. The very innovative River system in ElasticSearch allows you to accomplish this by simply running a few REST requests against the server, with almost no additional configuration requirements. Personally I knew it was something special when I realized that every time it starts it gives itself a unique nickname (for clustering purposes), and I was now the owner of a search node called Algrim the Strong.

Which limitations in CouchDB make this solution attractive ?
There is a lot to love about CouchDB, such as its very intuitive REST API and its blistering speed at storing and retrieving documents. What we quickly ran into issues with however was how indexing and searching works with Couch views. Quite simply, they are not nearly flexible enough and the additional processing required for them caused severe performance problems for our use cases.

Leave a Reply

Your email address will not be published. Required fields are marked *