Presented by Renaud Delbru, Co-Founder, SindiceTech
In this presentation, we will discuss how Lucene and Solr can be used for very efficient search of tree-shaped schemaless document, e.g. JSON or XML, and can be then made to address both graph and relational data search. We will discuss the capabilities of SIREn, a Lucene/Solr plugin we have developed to deal with huge collections of tree-shaped schemaless documents, and how SIREn is built using Lucene extensibility capabilities (Analysis, Codec, Flexible Query Parser). We will compare it with Lucene's BlockJoin Query API in nested schemaless data intensive scenarios. We will then go through use cases that show how relational or graph data can be turned into JSON documents using Hadoop and Pig, and how this can be used in conjunction with SIREn to create relational faceting systems with unprecedented performance. Take-away lessons from this session will be awareness about using Lucene/Solr and Hadoop for relational and graph data search, as well as the awareness that it is now possible to have relational faceted browsers with sub-second response time on commodity hardware.