More Powerful Solr Search with Semaphore - Jeremy Bentley

1,258 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Metadata is widely understood to be a critical element of search, discovery and classification. But with the preponderance of unstructured data addressed by search technology, consistent native metadata is often in short supply. Organizations often find that the quality and depth of contextual metadata -- what documents are about – can maker or break search relevancy, precision and recall.

Semaphore is an enterprise semantic platform that uniquely captures an organization‘s subjects and topics into a taxonomy or ontology (model), in a manner that adds context for enhanced navigation and findability. Semaphore augments traditional information management systems like Solr search by adding advanced content classification, metadata and navigation capabilities to deliver a more complete, higher quality enterprise information management experience. This talk will focus on the following:

Deep dive into the technical integration of Semaphore with Apache/ Solr (including the connection points between Semaphore and Solr)
Discuss the Semaphore modules (Ontology Manager, Classification Server, Semantic Enhancement Server and Search Application Server) and how they provide better findability
Share a demonstration of Solr in action
Present a client case study (Nordyske).

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,258
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
24
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

More Powerful Solr Search with Semaphore - Jeremy Bentley

  1. 1. Smartlogic TM Apache Lucene Eurocon     Jeremy  Bentley,  CEO  
  2. 2. 1st degree of orderFiling management• 80% of enterprise information isunstructured• Doubling every 19 months andaccelerating [Gartner]• Increasing burden of compliance• Enterprise 2.0 additions
  3. 3. 2nd degree of orderIndex management• File plans and metadata schema• Mono- hierarchical standardisedtaxonomies• Manually applied classification• Low level of consistency and quality
  4. 4. 3rd degree of orderComputerised 1st and 2nd degrees
  5. 5. 5  A 10 year Flatline Expectation Gap• 2001,  IDC,  “Quan5fying  Enterprise  Search”    Searchers  are  successful  in  finding  what  they  seek  50%  of  the  9me  or  less      • 2011,  MindMetre/SmartLogic  More  than  half    (52%)  cannot  find  the  informa9on  they  need  using  their  Enterprise  search  system    
  6. 6. The explosion of information 80Tb   ?   20  5mes   Terabytes  of  data   increase  in   Informa5on   volume   4Tb   1993-­‐2001   2001-­‐2009   Source:  the  Na5onal  Archives  
  7. 7. 7  Search Gets Harder as Data sets Grow   Circa  1996      
  8. 8. Different vocabulary and ambiguityYou  Say   I  Say  Moon  Buggy   Lunar  Roving  Vehicle   Manned  Lunar  Surface  Vehicle   Missing resultsSwine  Flu   Swine  Influenza  Virus   H1N1  Touchscreen   Touch  screen   Mul5-­‐touch  You  Say   What  do  you  mean?  Apple   A  fruit?   Fiona  -­‐  A  singer  /  songwriter?   An  electronics  company?  Rights   Employment  rights?   Too many results Equal  rights?   Right  of  way?  Ford   Ford  Motor   Forward  Industrials  (5cker=FORD)   A  shallow  river  crossing  
  9. 9. Conventional Search - Ineffective, Frustrating, and Inadequate Drawbacks Apparent 1 Needle in the Haystack 2   2 Multiple search terms 1   3 Irrelevant results 4 Out of date results 5 Multiple media forms 6 Unrestricted geography 7 Inappropriate ads Not So Apparent 7   8 Can’t filter, select subset 9 No related topics 4   10 Missing results 11 No context or guidance 12 Best resource not clear 5   3   ü  Time consuming 6   ü  Inefficient ü  Ineffective
  10. 10. Knowing what you have
  11. 11. Paradox of Effort Metadata  is  to  search,  what  pistons  are  to  a  petrol  engine.   Web Enterprise Metadata effort High Low Result Quality Low High requirement
  12. 12. How do I structure it? Information Subject   Crea5on  Date   Loca5on   Modified  Date   Project   Author   Func5on   Format   (PDF,DOC,XLS)   (IT,HR,Finance)   Protec5ve   Marker   Expiry   Publisher   Expert   Reten5on   Site  Process Structural
  13. 13. 3rd degree content universe Enterprise   Content   Search   Management   Portal   Infrastructure   Document    Management   Social  collaboraFon   Records   Management   Publishing   Process     Systems   Management  &   Digital   Workflow   Asset   Management   eDiscovery  
  14. 14. 4th degree of order Enterprise   Content   Search   Management   Portal   Infrastructure   Document    Management   Social  collaboraFon   Content Records   Intelligence Management   Publishing   Process     Systems   Management  &   Digital   Workflow   Asset   Management   eDiscovery  
  15. 15. 4th degree of orderContent Intelligence Content  Intelligence  Plahorm        Solr  
  16. 16. Semaphore Business     Vocabulary   Expose   Apply   Classifica5on   User   Decision   Ac5on   Inform   Copyright  @  2011  Smartlogic  Semaphore  Limited   16  
  17. 17. Semaphore Business   Vocabulary  Seman6c  models   Expose   Apply   Metadata   Seman6c  So7ware   Classifica5on   User   Decision   Ac5on   Inform   Contextual  User  Experience   Copyright  @  2011  Smartlogic  Semaphore  Limited   17  
  18. 18. Components• Metadata  • Seman5c  Models  • Contextual  User  Experience  • Seman5c  Sokware   Copyright  @  2011  Smartlogic  Semaphore  Limited   18  
  19. 19. Metadata Today   With  Content  Intelligence   Manual   Automa5c   Process   Process   Mul5ple    approaches     Single  Unified  ‘one  size  fits  all’  approach     for  various  domains/audiences   Long  5me  to  crak   Short  5me  to  build    &  build  ,  manually  applied   &  deploy,  automa5cally     Low  Quality  tags   High  Quality  tags   High  cost  to  apply   Low  cost  to  apply   Copyright  @  2011  Smartlogic  Semaphore  Limited   19  
  20. 20. Semantic Models Organising Contextualising Harnessing Parent topics Content-types available Automate Covered by – Automotive sector – Flashnotes compliance and – Bob Smith – Bond issuers – Research reports distribution tasks – Trade ideas – ‘Watch list’ lookup Analytics available – Distribution according to preset – Current bond price rules Preferred term (Agreed Label) – Relative bond spreads – Automated mapping Ford Motor Company Influenced by to create aggregator metadata – Credit ratings on Ford Motor Credit Company User Experience – European and US economies – Conceptual relevance Also known as Location of – Changes in consumer demand – Related topics – Ford fundamental data – Links to analytics – Ford Motor – Earnings estimates Search engine enhancement – F (Bloomberg) – Historic sales Key competitors – Search results – FoMoCo and profits – BMW – Email alerts – blue oval – Daimler Chrysler – General Motors Unstructured Subsidiaries – Toyota content integration – Ford Motor Credit Company – Volkswagen – Published reports – Mazda Products – Related topics – Focus – Links to analytics – Ka – Search results – MX5 – Email alerts
  21. 21. Contextual User Experience 9   Key Features 1 Taxonomy enables discovery, related searches 1   2 Related topics and content 2   3   3 Facets enable filtering results by: 4   4 -  Source 5 -  Numerous topics 6 - Date 5   7   7 Best Bets 8   8 Automated doc. Tagging 9 A-Z ü  More relevant results ü  Fewer “bad hits” ü  Powerful navigation 6  
  22. 22. Content  ExploraFonHighligh5ng  rela5onships  in  a  result  set  greatly  improves  the  user  experience.  
  23. 23. Semantic Software Semaphore   Ontology    &  Metadata  Management   Text  Analysis  &  Extrac5on   Automa5c    and  assisted    Content  classifica5on   Contextual  Naviga5on  Services   Seman5c  Reasoning  &  Processing  
  24. 24. Semaphore Search Integration Classifica5on   Search   Local   Term   Rules   Enhancement   Index   Ontology  Manager   Classifica5on  Server   Server   Web  Services  API   Text  Miner   XML  API   Ontology  Informa5on   Document  “Tags”   Extracted  Text   Sample  Interface  Code   User  Requests   Query   Index   Collector/Normalizer   Search   Applica5on   Framework   Portal   Search  Engine   Corpus   Semaphore  core  module   Semaphore  op5onal  module  
  25. 25. 4th degree of order Enterprise   Content   Search   Management   Portal   Infrastructure   Document    Management   Social  collaboraFon   Content Records   Intelligence Management   Publishing   Process     Systems   Management  &   Digital   Workflow   Asset   Management   eDiscovery  
  26. 26. Content Intelligence Informa5on   Manufacturing   Mone5sa5on   Knowledge   Metadata   Recovery   Data  Loss  Preven5on   Risk  &  Compliance   Content     Analy5cs  
  27. 27. Content Intelligent Solutions Micro-­‐Targe5ng  &   Distribu5on    Web     Knowledge    Self  Service   Acquisi5on   &  Recovery   Governance   Cross  Plahorm   Risk     Content  Integra5on   Compliance  
  28. 28. www.smartlogic.com   28  
  29. 29. Smartlogic TMJeremy.Bentley@Smartlogic.com www.smartlogic.com   29  

×