SlideShare a Scribd company logo
1 of 21
Download to read offline
APACHE SLING & FRIENDS TECH MEETUP
BERLIN, 23-25 SEPTEMBER 2013
Scaling Search in Oak with Solr
Tommaso Teofili
adaptTo() 2013 2
A look back ...
Scaling Search in Oak with Solr
adaptTo() 2013 3
§  Following up last year’s “Oak / Solr
integration” session
Looking at last year’s agenda
adaptTo() 2013 4
§  What we have:
§  Search Oak content with Solr
independently
§  Oak running with a Solr index
§  Embedded	
  	
  
§  Remote	
  
§  What we miss:
§  Solr based MicroKernel
Apache Jackrabbit Oak
adaptTo() 2013 5
§  Scalable content repository
§  JCR compatibility (to some degree)
§  Performance especially for concurrent
access
§  Scalability for huge repositories (> 100M
nodes)
§  Support managed environments (e.g. OSGi)
§  Cloud deployments
Apache Solr
adaptTo() 2013 6
§  Enterprise search platform
§  Based on Apache Lucene
§  Full text, faceting, highlighting, etc.
§  Dynamic cluster architecture
§  HTTP API
§  Latest release 4.4.0
Why Apache Solr
adaptTo() 2013 7
§  Distributed, fault tolerant indexing /
searching
§  Highly configurable
§  without touching the repository
§  Customizable
adaptTo() 2013 8
How it works ...
IndexEditor API
adaptTo() 2013 9
§  an	
  Editor	
  receives	
  info	
  about	
  changes	
  to	
  the	
  
content	
  tree	
  
§  the	
  Editor	
  evaluates	
  the	
  status	
  before	
  and	
  a6er	
  a	
  
specific	
  Oak	
  commit	
  
§  can	
  reject	
  or	
  accept	
  the	
  changes	
  (by	
  even	
  
modifying	
  the	
  tree	
  itself)	
  
§  an	
  Editor	
  is	
  executed	
  right	
  before	
  an	
  Oak	
  commit	
  
is	
  persisted	
  
§  the	
  SolrIndexHook	
  maps	
  changes	
  between	
  
statuses	
  in	
  the	
  content	
  tree	
  to	
  Solr	
  documents	
  
and	
  sends	
  them	
  to	
  the	
  Solr	
  instance(s)	
  
IndexEditor API – creating content
adaptTo() 2013 10
§  NodeState	
  before	
  =	
  	
  builder.getNodeState();	
  
§  builder.child("newnode").setProperty("prop",	
  
"val");	
  
§  NodeState	
  a6er	
  =	
  builder.getNodeState();	
  
§  EditorHook	
  hook	
  =	
  new	
  EditorHook(new	
  
SolrIndexEditorProvider(…));	
  
§  NodeState	
  indexed	
  =	
  
hook.processCommit(before,	
  a6er);	
  
QueryIndex API
adaptTo() 2013 11
§  evaluate the query (Filter) cost for each
available index to find the “best”
§  execute the query (Filter) against a specific
revision and root node state
§  internally	
  the	
  Filter	
  is	
  usually	
  mapped	
  to	
  the	
  
underlying	
  implementaNon	
  counterpart	
  
§  improved	
  support	
  for	
  full	
  text	
  queries	
  in	
  Oak	
  
§  eventually view the query “plan”
QueryIndex API
adaptTo() 2013 12
§  mapping Filter restrictions:
§  Property restrictions:
§  Each	
  property	
  is	
  mapped	
  as	
  a	
  field	
  
–  Can	
  use	
  term	
  queries	
  for	
  simple	
  value	
  matching	
  or	
  range	
  
queries	
  for	
  “first	
  to	
  last”	
  
§  Path restrictions
–  indexed	
  as	
  strings	
  and	
  with	
  special	
  fields	
  for	
  parent	
  /	
  
children	
  /	
  descendant	
  matching	
  
§  Full text expressions
–  use	
  (E)DisMax	
  query	
  parser	
  and/or	
  fallback	
  fields	
  
§  NodeType restrictions TBD
Configuring the Solr index in Oak
adaptTo() 2013 13
§  create an index configuration under
§  /oak:index/solrIdx
§  jcr:primaryType	
  =	
  oak:queryIndexDefiniNon	
  
§  type	
  =	
  solr	
  
–  plus	
  some	
  mandatory	
  props	
  (e.g.	
  reindex)	
  
§  AddiNonal	
  properNes	
  if	
  want	
  to	
  run	
  an	
  
embedded	
  Solr	
  server	
  (more	
  on	
  this	
  later)	
  
Configuring the Solr index in Oak
adaptTo() 2013 14
§  Pluggable Solr server providers and
configuration
§  to allow different deployment scenarios
§  to allow custom configuration
Oak Solr core bundle
adaptTo() 2013 15
§  provides basic API and implementation
to index and search Oak content on Solr
§  Solr implementation of IndexEditor
§  Solr implementation of QueryIndex
§  allows configurable mapping between
§  property types and fields
§  e.g.	
  all	
  binaries	
  should	
  be	
  indexed	
  in	
  specific	
  field	
  
§  filter restrictions and fields
§  e.g.	
  path	
  restricNons	
  for	
  children	
  should	
  hit	
  a	
  
certain	
  field	
  
Oak Solr Embedded bundle
adaptTo() 2013 16
§  provides support for indexing and
searching on an embedded Solr instance
§  running inside the Oak repository
§  configuration can be done
§  via	
  the	
  repository	
  
–  stored	
  in	
  the	
  index	
  definiNon	
  node	
  
§  via	
  OSGi	
  
Oak Solr Remote bundle
adaptTo() 2013 17
§  provides support for indexing and
searching on remote Solr instances
§  single Solr instance
§  distributed / replicated Solr cluster
§  SolrCloud deployments
§  configuration is done via OSGi
OSGi platform running on Oak with Solr
adaptTo() 2013 18
§  Star	
  instance	
  with	
  Oak	
  repository	
  
§  Add	
  a	
  bunch	
  of	
  bundles	
  for	
  Solr	
  
–  oak-­‐solr-­‐core,	
  oak-­‐solr-­‐remote,	
  zookeeper,	
  
servicemix.bundles.solr-­‐solrj,	
  etc.	
  
§  Configure	
  the	
  Solr	
  instance	
  
§  Configure	
  oak-­‐solr-­‐remote	
  providers	
  via	
  OSGi	
  
adaptTo() 2013 19
See it in action ...
Solr index populated with Oak content
adaptTo() 2013 20
What needs to be done
adaptTo() 2013 21
§  Easy OSGi deployment
§  Move common configuration stuff in
oak-solr-core
§  Leverage new full text expression API
§  Solr MK?

More Related Content

What's hot

Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Lucidworks
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksLucidworks
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
 

What's hot (20)

Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
 

Similar to Scaling search in Oak with Solr

BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchNetConstructor, Inc.
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solrNet7
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Small wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSmall wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSourcesense
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2longkeyy
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
jclouds High Level Overview by Adrian Cole
jclouds High Level Overview by Adrian Colejclouds High Level Overview by Adrian Cole
jclouds High Level Overview by Adrian ColeEverett Toews
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...J V
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 

Similar to Scaling search in Oak with Solr (20)

BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Small wins in a small time with Apache Solr
Small wins in a small time with Apache SolrSmall wins in a small time with Apache Solr
Small wins in a small time with Apache Solr
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
jclouds High Level Overview by Adrian Cole
jclouds High Level Overview by Adrian Colejclouds High Level Overview by Adrian Cole
jclouds High Level Overview by Adrian Cole
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 

More from Tommaso Teofili

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRTommaso Teofili
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in SlingTommaso Teofili
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache HamaTommaso Teofili
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiTommaso Teofili
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaTommaso Teofili
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on codeTommaso Teofili
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA IntroductionTommaso Teofili
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU TourTommaso Teofili
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationTommaso Teofili
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic SearchTommaso Teofili
 

More from Tommaso Teofili (16)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and Clerezza
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on code
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU Tour
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata Generation
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic Search
 

Recently uploaded

Exploring Sicily Your Comprehensive Ebook Travel Guide
Exploring Sicily Your Comprehensive Ebook Travel GuideExploring Sicily Your Comprehensive Ebook Travel Guide
Exploring Sicily Your Comprehensive Ebook Travel GuideTime for Sicily
 
Aeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change PolicyAeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change PolicyFlyFairTravels
 
(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCRsoniya singh
 
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)Escort Service
 
Where to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasdWhere to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasdusmanghaniwixpatriot
 
question 2: airplane vocabulary presentation
question 2: airplane vocabulary presentationquestion 2: airplane vocabulary presentation
question 2: airplane vocabulary presentationcaminantesdaauga
 
Inspirational Quotes About Italy and Food
Inspirational Quotes About Italy and FoodInspirational Quotes About Italy and Food
Inspirational Quotes About Italy and FoodKasia Chojecki
 
Moroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptxMoroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptxOmarOuazzani1
 
How Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s WatersHow Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s WatersMakena Coast Charters
 
Revolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI UpdateRevolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI Updatejoymorrison10
 
Authentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptxAuthentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptxGregory DeShields
 
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsxHoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsxChung Yen Chang
 
Italia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue muraItalia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue murasandamichaela *
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)Mazie Garcia
 
Haitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptxHaitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptxhxhlixia
 
8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCR
8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCR8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCR
8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCRdollysharma2066
 
"Fly with Ease: Booking Your Flights with Air Europa"
"Fly with Ease: Booking Your Flights with Air Europa""Fly with Ease: Booking Your Flights with Air Europa"
"Fly with Ease: Booking Your Flights with Air Europa"flyn goo
 
Dubai Call Girls O528786472 Call Girls Dubai Big Juicy
Dubai Call Girls O528786472 Call Girls Dubai Big JuicyDubai Call Girls O528786472 Call Girls Dubai Big Juicy
Dubai Call Girls O528786472 Call Girls Dubai Big Juicyhf8803863
 

Recently uploaded (20)

Exploring Sicily Your Comprehensive Ebook Travel Guide
Exploring Sicily Your Comprehensive Ebook Travel GuideExploring Sicily Your Comprehensive Ebook Travel Guide
Exploring Sicily Your Comprehensive Ebook Travel Guide
 
Aeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change PolicyAeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change Policy
 
(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Nand Nagri 🔝 Delhi NCR
 
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
 
Enjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCR
 
Where to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasdWhere to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasd
 
question 2: airplane vocabulary presentation
question 2: airplane vocabulary presentationquestion 2: airplane vocabulary presentation
question 2: airplane vocabulary presentation
 
Inspirational Quotes About Italy and Food
Inspirational Quotes About Italy and FoodInspirational Quotes About Italy and Food
Inspirational Quotes About Italy and Food
 
Moroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptxMoroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptx
 
How Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s WatersHow Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s Waters
 
Revolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI UpdateRevolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI Update
 
Authentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptxAuthentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptx
 
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsxHoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
 
Enjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCR
 
Italia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue muraItalia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue mura
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
 
Haitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptxHaitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptx
 
8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCR
8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCR8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCR
8377087607 Full Enjoy @24/7 Call Girls in INA Market Dilli Hatt Delhi NCR
 
"Fly with Ease: Booking Your Flights with Air Europa"
"Fly with Ease: Booking Your Flights with Air Europa""Fly with Ease: Booking Your Flights with Air Europa"
"Fly with Ease: Booking Your Flights with Air Europa"
 
Dubai Call Girls O528786472 Call Girls Dubai Big Juicy
Dubai Call Girls O528786472 Call Girls Dubai Big JuicyDubai Call Girls O528786472 Call Girls Dubai Big Juicy
Dubai Call Girls O528786472 Call Girls Dubai Big Juicy
 

Scaling search in Oak with Solr

  • 1. APACHE SLING & FRIENDS TECH MEETUP BERLIN, 23-25 SEPTEMBER 2013 Scaling Search in Oak with Solr Tommaso Teofili
  • 2. adaptTo() 2013 2 A look back ...
  • 3. Scaling Search in Oak with Solr adaptTo() 2013 3 §  Following up last year’s “Oak / Solr integration” session
  • 4. Looking at last year’s agenda adaptTo() 2013 4 §  What we have: §  Search Oak content with Solr independently §  Oak running with a Solr index §  Embedded     §  Remote   §  What we miss: §  Solr based MicroKernel
  • 5. Apache Jackrabbit Oak adaptTo() 2013 5 §  Scalable content repository §  JCR compatibility (to some degree) §  Performance especially for concurrent access §  Scalability for huge repositories (> 100M nodes) §  Support managed environments (e.g. OSGi) §  Cloud deployments
  • 6. Apache Solr adaptTo() 2013 6 §  Enterprise search platform §  Based on Apache Lucene §  Full text, faceting, highlighting, etc. §  Dynamic cluster architecture §  HTTP API §  Latest release 4.4.0
  • 7. Why Apache Solr adaptTo() 2013 7 §  Distributed, fault tolerant indexing / searching §  Highly configurable §  without touching the repository §  Customizable
  • 8. adaptTo() 2013 8 How it works ...
  • 9. IndexEditor API adaptTo() 2013 9 §  an  Editor  receives  info  about  changes  to  the   content  tree   §  the  Editor  evaluates  the  status  before  and  a6er  a   specific  Oak  commit   §  can  reject  or  accept  the  changes  (by  even   modifying  the  tree  itself)   §  an  Editor  is  executed  right  before  an  Oak  commit   is  persisted   §  the  SolrIndexHook  maps  changes  between   statuses  in  the  content  tree  to  Solr  documents   and  sends  them  to  the  Solr  instance(s)  
  • 10. IndexEditor API – creating content adaptTo() 2013 10 §  NodeState  before  =    builder.getNodeState();   §  builder.child("newnode").setProperty("prop",   "val");   §  NodeState  a6er  =  builder.getNodeState();   §  EditorHook  hook  =  new  EditorHook(new   SolrIndexEditorProvider(…));   §  NodeState  indexed  =   hook.processCommit(before,  a6er);  
  • 11. QueryIndex API adaptTo() 2013 11 §  evaluate the query (Filter) cost for each available index to find the “best” §  execute the query (Filter) against a specific revision and root node state §  internally  the  Filter  is  usually  mapped  to  the   underlying  implementaNon  counterpart   §  improved  support  for  full  text  queries  in  Oak   §  eventually view the query “plan”
  • 12. QueryIndex API adaptTo() 2013 12 §  mapping Filter restrictions: §  Property restrictions: §  Each  property  is  mapped  as  a  field   –  Can  use  term  queries  for  simple  value  matching  or  range   queries  for  “first  to  last”   §  Path restrictions –  indexed  as  strings  and  with  special  fields  for  parent  /   children  /  descendant  matching   §  Full text expressions –  use  (E)DisMax  query  parser  and/or  fallback  fields   §  NodeType restrictions TBD
  • 13. Configuring the Solr index in Oak adaptTo() 2013 13 §  create an index configuration under §  /oak:index/solrIdx §  jcr:primaryType  =  oak:queryIndexDefiniNon   §  type  =  solr   –  plus  some  mandatory  props  (e.g.  reindex)   §  AddiNonal  properNes  if  want  to  run  an   embedded  Solr  server  (more  on  this  later)  
  • 14. Configuring the Solr index in Oak adaptTo() 2013 14 §  Pluggable Solr server providers and configuration §  to allow different deployment scenarios §  to allow custom configuration
  • 15. Oak Solr core bundle adaptTo() 2013 15 §  provides basic API and implementation to index and search Oak content on Solr §  Solr implementation of IndexEditor §  Solr implementation of QueryIndex §  allows configurable mapping between §  property types and fields §  e.g.  all  binaries  should  be  indexed  in  specific  field   §  filter restrictions and fields §  e.g.  path  restricNons  for  children  should  hit  a   certain  field  
  • 16. Oak Solr Embedded bundle adaptTo() 2013 16 §  provides support for indexing and searching on an embedded Solr instance §  running inside the Oak repository §  configuration can be done §  via  the  repository   –  stored  in  the  index  definiNon  node   §  via  OSGi  
  • 17. Oak Solr Remote bundle adaptTo() 2013 17 §  provides support for indexing and searching on remote Solr instances §  single Solr instance §  distributed / replicated Solr cluster §  SolrCloud deployments §  configuration is done via OSGi
  • 18. OSGi platform running on Oak with Solr adaptTo() 2013 18 §  Star  instance  with  Oak  repository   §  Add  a  bunch  of  bundles  for  Solr   –  oak-­‐solr-­‐core,  oak-­‐solr-­‐remote,  zookeeper,   servicemix.bundles.solr-­‐solrj,  etc.   §  Configure  the  Solr  instance   §  Configure  oak-­‐solr-­‐remote  providers  via  OSGi  
  • 19. adaptTo() 2013 19 See it in action ...
  • 20. Solr index populated with Oak content adaptTo() 2013 20
  • 21. What needs to be done adaptTo() 2013 21 §  Easy OSGi deployment §  Move common configuration stuff in oak-solr-core §  Leverage new full text expression API §  Solr MK?