OSS Enterprise Search
EU Tour
Spreading Enterprise Search Solutions around Europe
London – Amsterdam – Rome
25 – 26 – 28 October
Upayavira
Maurizio Pillitu
Tommaso Teofili
Summary
✓ Sourcesense is involved in many Open Source projects
✓ We continuously spot opportunities to integrate them
✓ Has contributors to Apache UIMA and CMIS we saw
opportunities to integrate them with Lucene/Solr...
✓ ...and so we did!
✓ Everything is already released as OSS or will be shortly
Solr - UIMA
Semantic content extraction while indexing
Solr - UIMA
✓ A Solr plugin to automatically extract relevant
knowledge from documents while indexing them
✓ Recognize and search document’s language, sentences,
keywords, concepts, named entities, ...
✓ Extensible architecture provided by Apache UIMA to
extract and index more information via configuration
✓ Proposed as Apache Solr patch (issue SOLR-2129)
Solr – UIMA use cases
✓ Automatic enable language specific documents’ search
✓ Easy sentence scoped search
✓ Full text search on concepts, keywords or other named
entities (cities, persons, companies)
✓ Semantic faceting
✓ Plug other semantic enrichment engines (no further
architectural layers required)
CMIS
✓ Interoperability between different Enterprise Content
Management Systems
✓ OASIS Specification on May 1, 2010
✓ Standard Data Model
✓ SOAP and ATOM Pub WS over REST
✓ Java, JavaScript, PHP, Python, .NET implementations
Why CMIS
✓ Allows to build and leverage applications against
multiple repositories
✓ Decouples Web Services from the Content
Management System
✓ Avoids yet another custom WS tier
✓ Standardized and certified interfaces
✓ Platform and language agnostic
Solr CMIS Integration
✓ Retrieves documents from multiple CMIS repositories
✓ Configurable mapping cmis:document into
solr:document
✓ Leverages Solr Multicore feature
✓ Smooth integration with pre-existing data
✓ Keeps Solr indexes up-to-date with CMIS repository
changes
“From scratch Deployment”
●
The need:
✓ Reliable, resilient, scalable search solution
✓ Ability to roll out new 'rows' at will
●
The solution:
✓ Virtualisation
✓ Automation
✓ Tools: Capistrano, bash, potentially Puppet/Chef
Sample Solr Setup
shard3shard3
co-ordinatorco-ordinator
shard1shard1
shard2shard2
Load balancer
Load balancer
shard3shard3
co-ordinatorco-ordinator
shard1shard1
shard2shard2
DeployX Stages
Instantiate VMInstantiate VM
Configure hostConfigure host
Deploy applicationDeploy application
Duplicate dataDuplicate data
Add to poolAdd to pool
Push buttonPush button
Inparallel
Demo
✓Extract CMIS documents
✓Index on Solr
✓Enrich with UIMA

OSS Enterprise Search EU Tour

  • 1.
    OSS Enterprise Search EUTour Spreading Enterprise Search Solutions around Europe London – Amsterdam – Rome 25 – 26 – 28 October Upayavira Maurizio Pillitu Tommaso Teofili
  • 2.
    Summary ✓ Sourcesense isinvolved in many Open Source projects ✓ We continuously spot opportunities to integrate them ✓ Has contributors to Apache UIMA and CMIS we saw opportunities to integrate them with Lucene/Solr... ✓ ...and so we did! ✓ Everything is already released as OSS or will be shortly
  • 3.
    Solr - UIMA Semanticcontent extraction while indexing
  • 4.
    Solr - UIMA ✓A Solr plugin to automatically extract relevant knowledge from documents while indexing them ✓ Recognize and search document’s language, sentences, keywords, concepts, named entities, ... ✓ Extensible architecture provided by Apache UIMA to extract and index more information via configuration ✓ Proposed as Apache Solr patch (issue SOLR-2129)
  • 5.
    Solr – UIMAuse cases ✓ Automatic enable language specific documents’ search ✓ Easy sentence scoped search ✓ Full text search on concepts, keywords or other named entities (cities, persons, companies) ✓ Semantic faceting ✓ Plug other semantic enrichment engines (no further architectural layers required)
  • 6.
    CMIS ✓ Interoperability betweendifferent Enterprise Content Management Systems ✓ OASIS Specification on May 1, 2010 ✓ Standard Data Model ✓ SOAP and ATOM Pub WS over REST ✓ Java, JavaScript, PHP, Python, .NET implementations
  • 7.
    Why CMIS ✓ Allowsto build and leverage applications against multiple repositories ✓ Decouples Web Services from the Content Management System ✓ Avoids yet another custom WS tier ✓ Standardized and certified interfaces ✓ Platform and language agnostic
  • 8.
    Solr CMIS Integration ✓Retrieves documents from multiple CMIS repositories ✓ Configurable mapping cmis:document into solr:document ✓ Leverages Solr Multicore feature ✓ Smooth integration with pre-existing data ✓ Keeps Solr indexes up-to-date with CMIS repository changes
  • 9.
    “From scratch Deployment” ● Theneed: ✓ Reliable, resilient, scalable search solution ✓ Ability to roll out new 'rows' at will ● The solution: ✓ Virtualisation ✓ Automation ✓ Tools: Capistrano, bash, potentially Puppet/Chef
  • 10.
    Sample Solr Setup shard3shard3 co-ordinatorco-ordinator shard1shard1 shard2shard2 Loadbalancer Load balancer shard3shard3 co-ordinatorco-ordinator shard1shard1 shard2shard2
  • 11.
    DeployX Stages Instantiate VMInstantiateVM Configure hostConfigure host Deploy applicationDeploy application Duplicate dataDuplicate data Add to poolAdd to pool Push buttonPush button Inparallel
  • 12.
    Demo ✓Extract CMIS documents ✓Indexon Solr ✓Enrich with UIMA