We thought we had tracked down all the open source search engines on the Web, but our recent crawling yielded a new surprise: SenseiDB. This new search engine is an open source, distributed, real-time, semi-structured database. By doing a bit more research, we tracked down an article on the Sematext Blog about: “Sensei: Distributed, Real-time, Semi-Structured Database.” The Semeatext Blog article is an interview with John Wang, the search architect on Sensei. Wang explains that Sensei was designed to handle complex semi-structured queries on large and rapidly changing datasets. When asked why Wang and his team did not use Solr or Lucene, the explanation was:
Sensei leverages Lucene.
We weren’t able to leverage Solr because of the following requirements:
High update requirement, 10’s of thousands updates per second in to the system.
Real distributed solution, current Solr’s distributed story has a SPOF at the master, and Solr Cloud is not yet completed.
Complex faceting support. Not just your standard terms based faceting. We needed to facet on social graph, dynamic time ranges and many other interesting faceting scenarios. Faceting behavior also needs to be highly customizable, which is not available via Solr.
Sensei has a few options that other open source search applications do not, such as high update rates and non-trivial semi-structured support. Sensei is well on its way to making a name for itself in the open source community, though Lucene and ElasticSearch have already gained a big lead. LucidWorks is key player in the open source search game, being the first company to build professional software on Apache Lucene.
Whitney Grace, October 18, 2012