Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flexible search in Apache Jackrabbit Oak

2,422 views

Published on

ApacheCon EU 2014 presentation about the flexible architecture for search in Apache Jackrabbit Oak.

Published in: Technology
  • Be the first to comment

Flexible search in Apache Jackrabbit Oak

  1. 1. Flexible search in Apache Jackrabbit Oak Tommaso Teofili
  2. 2. Apache Jackrabbit Oak • Scalable content repository • JCR 2.0 • Designed for concurrent access (MVCC) • Pluggable components (storage, indexes) • Powering AEM 6.0 18/11/14 2
  3. 3. Oak Architecture • Oak-JCR • Oak-Core – MVCC (node states and immutable trees) – Core components (Security, Query engine, …) – Plugins • Oak-MK – Pluggable storage 18/11/14 3
  4. 4. Oak – the Query Engine • Query languages – XPATH – SQL-2 • Selects the index(es) supposed to perform better – Search is demanded to the underlying indexes – No index? The repository is traversed • ACLs applied afterwards 18/11/14 4
  5. 5. Indexing – the IndexEditor API • NodeState before = builder.getNodeState(); • builder.child(”a").setProperty(”foo", ”bar"); • NodeState after = builder.getNodeState(); • NodeState indexed = editorHook.processCommit(before, after, …); // who said MVCC? 18/11/14 5
  6. 6. Searching – the QueryIndex API • Filter filter = … ; // "select * from [nt:folder]" • filter.restrictPath("/somenode", Filter.PathRestriction.DIRECT_CHILDREN); • Cursor cursor = queryIndex.query(filter, nodeState); // search against a state • IndexRow row = cursor.next(); // results 18/11/14 6
  7. 7. Searching – Filters • Full text expressions • Property restrictions • Path restrictions – Exact – Parent – Child – Descendant • Node type restrictions 18/11/14 7
  8. 8. Configuring indexes • Indexes are declared by adding “query index configuration” nodes in the repository – Type – Asynchronous – Reindex – Index specific properties 18/11/14 8
  9. 9. In repository indexes • Data structures designed as content – Property index – Ordered property index – Node type index – Reference index 18/11/14 9
  10. 10. Lucene index • Full text and (sorted) property restrictions • Stored in repository • Tika for indexing binaries • Configurable indexing rules (boost), codec, analyzers 19/11/14 10
  11. 11. Lucene index • Interesting facts – DocValues for sorted property restrictions – Uncompressed stored fields – Property exists queries • TermRange vs Wildcard vs Term vs MatchAll +FieldExistsFilter 19/11/14 11
  12. 12. Solr index • Full text, property, path restrictions • Embedded or remote Solr(Cloud) • Configurable – Mapping restriction / fields – Page size – Commit policy • Most is configured on the Solr side 18/11/14 12
  13. 13. Problems • Hard to express complex queries • Cannot leverage underlying indexes advanced capabilities 18/11/14 13
  14. 14. Native language support • Leverage underlying index capabilities – Multiple query languages/parsers • More accurate full text queries (and results) – … where native(’lucene', 'name:(hello world) “hello world”^3') • Advanced index capabilities (e.g. MLT) – … where native('solr', 'mlt?q=path:/content/ sample1&mlt.fl=jcr:title') 19/11/14 14
  15. 15. Adding more indexes • Create an IndexEditor – Turn diff into an “indexable” • Create a QueryIndex – Turn a Filter into an index-specific query • “Declare” the index 18/11/14 15
  16. 16. Looking forward • Results aggregation features (e.g. facets) • More configuration options (Lucene, Solr) • Smarter index selection • Cover indexes 18/11/14 16
  17. 17. Thanks

×