Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ECIR 2014 Industry Day
Content Discovery Through Entity Driven Search
Alessandro Benedetti
http://uk.linkedin.com/in/alexb...
• Experienced at building and delivering a wide range of enterprise
solutions across the whole information life cycle
• Al...
Working effectively together
Who We Are
3
Antonio David Pérez Morales
- R&D Senior Engineer
- Master in Engineering and Te...
Working effectively together
Agenda
4
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
5
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Zaizi R&D Department
6
•Giving sense to the content
• Enriching it semantically
•Adding value...
Working effectively together
Agenda
7
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Enterprise Search Problems
8
Challenge :
Search within Big and Heterogeneus Repositories
• He...
Working effectively together
Current Enterprise Search Weaknesses
9
• Keyword based
• Low precision
• Ambiguous terms not ...
Working effectively together
Agenda
10
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Entity Driven Search
11
• Moves from keywords to Entities
•More understandable to a Human
• P...
Working effectively together
Sensefy
12
• Semantic Enterprise Search Engine
• Federated Search
• Evolved User Experience
•...
Working effectively together
Architecture
13
Working effectively together
RedLink
14
• Semantic Cloud platform
• Providing Software as a Service
• Manage unstructured ...
Working effectively together
NLP & Semantic Enrichment
15
• From unstructured to structured
• NLP Analysis. POS Tagging
• ...
Working effectively together
Smart Autocomplete
16
• Multi Phase suggestions
• Closer to natural language query formulatio...
Working effectively together
Smart Autocomplete
Configuration
17
• Entity type properties
•Interesting to our use case and...
Working effectively together
Semantic Search
18
• Search by Named Entity
• Search by Entity Type
• Search by Entity Type p...
Working effectively together
Semantic More Like This
19
• Search for Similar Documents based on Entities and Entities’
cat...
Working effectively together
Agenda
20
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
21
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Future Work
22
• Semantic More Like This new approach (Graph
relations)
• Machine Learning co...
Working effectively together
Conclusions
23
• Better user experience
• More precision in search results
• Closer to human ...
Zaizi Headquarters
Brook House
4th Floor, North Wing
229-243 Shepherd’s Bush Road
London W6 7AN
United Kingdom
T: (+44) 20...
Upcoming SlideShare
Loading in …5
×

Content Discovery Through Entity Driven Search

790 views

Published on

Leveraging enterprise information is no easy task, especially when unstructured information represents more than 80% of enterprise content. Meaningfully structuring content is critical for companies, Natural Language Processing and Semantic Enrichment is becoming increasingly important to improve the quality of tasks related to information retrieval.

With the Semantic Web moving towards full realisation thanks to the Linked Data initiative and with the interest of major search engines in structured data, the enterprise search world is finding it more attractive to make its information machine readable and exploit that information to improve search over its content.
In this scenario, three trends are transforming the face of search:

Entity-oriented search. Searching not by keyword, but by entities that represent specific concepts in a certain domain.
Knowledge graphs. Leveraging relationships amongst entities: Linked Data datasets (Freebase, DbPedia….) or custom companies’ knowledge bases.
Search assistance. Autocomplete and spellchecking are now common features, but making use of semantic data makes it possible to offer smarter features, guiding the users to what they want, in a natural way.
Sometimes, the proper resources for building such features are not easy to obtain. In order to generate these, our approach includes a number of unstructured data processing mechanisms the goal of which is to automatically extract semantic information:

Extract content from heterogeneous data sources
Extract domain information and enrich the content through different NLP processes like Named Entity Recognition, Coreference Resolution, Entity Linking and Disambiguation, and Topic Annotation
Create specialised indexes to store the semantic information extracted
Currently there are a number of well developed uses of semantic extracted information such as faceting and concept indexing, however further methods of exploiting semantic extracted information are presenting themselves in the industry:

Smart Autocomplete
The target of this feature is to automatically complete users’ phrase with entity names and properties, helping them to find the desired documents through exploration of the domain Knowledge Graph. As the user keys in the phrase, the system will propose a set of named entities and/or a set of entity types. As the user accepts a suggestion, the system will dynamically adapt following suggestions to the chosen context.
The accuracy delivered by entity driven search brings increased satisfaction among users. They will see documents that are about a specific semantic concept, with concrete properties, and not about a keyword that can be ambiguously interpreted.

Semantic More Like This
A feature to find documents similar to one that is input, based on the underlying knowledge in the documents, instead of tokens.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Content Discovery Through Entity Driven Search

  1. 1. ECIR 2014 Industry Day Content Discovery Through Entity Driven Search Alessandro Benedetti http://uk.linkedin.com/in/alexbenedetti Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales 16th April 2014
  2. 2. • Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle • Alfresco & Ephesoft certified Platinum Partner • Red Hat Enterprise Linux Ready Partner • Crafter & Varnish Gold Partners • Search Solutions Consultant Alfresco Partner of the Year 2012 and 2013
  3. 3. Working effectively together Who We Are 3 Antonio David Pérez Morales - R&D Senior Engineer - Master in Engineering and Technology Software - Digital Identity and Security expert - Enterprise Search Background - Semantic, NLP, ML Technologies and Information Retrieval lover - Apache Stanbol Committer - Apache contributor @adperezmorales http://es.linkedin.com/in/adperezmorales/ Alessandro Benedetti - R&D Senior Engineer - Master in Computer Science - Information Retrieval background -- Enterprise Search specialist - Semantic, NLP, ML Technologies and Information Retrieval lover @AlexBenedetti http://uk.linkedin.com/in/alexbenedetti
  4. 4. Working effectively together Agenda 4 • Context • Problem • Solution • Demo • Future Works
  5. 5. Working effectively together Agenda 5 • Context • Problem • Solution • Demo • Future Works
  6. 6. Working effectively together Zaizi R&D Department 6 •Giving sense to the content • Enriching it semantically •Adding value to ECM/CMS • More structured content, easy to manage, link and search, •Improving search • Across different domains, data sources, User Experience • Machine Learning applied research • Content Organization – Recommendation Systems
  7. 7. Working effectively together Agenda 7 • Context • Problem • Solution • Demo • Future Works
  8. 8. Working effectively together Enterprise Search Problems 8 Challenge : Search within Big and Heterogeneus Repositories • Heterogeneus Data Sources • Filesystem, DB, ECM/CMS, Email, … • Unstructured Content • PDFs, text plain, Word, … • Documents not linked between each other • Federated Search needed • Search across data sources • Different permissions • Centralized endpoint
  9. 9. Working effectively together Current Enterprise Search Weaknesses 9 • Keyword based • Low precision • Ambiguous terms not in context • Not accurate weighting when keywords are combined in a query
  10. 10. Working effectively together Agenda 10 • Context • Problem • Solution • Demo • Future Works
  11. 11. Working effectively together Entity Driven Search 11 • Moves from keywords to Entities •More understandable to a Human • Process the unstructured text • Enrich it • Build specific indexes • Use entities and concepts in searches
  12. 12. Working effectively together Sensefy 12 • Semantic Enterprise Search Engine • Federated Search • Evolved User Experience • Based on cutting-edge Open Source Frameworks
  13. 13. Working effectively together Architecture 13
  14. 14. Working effectively together RedLink 14 • Semantic Cloud platform • Providing Software as a Service • Manage unstructured data • Extract knowledge and intelligence • Make sense of information • Feed into business processes • Open-Source based components • Entity Linking using Knowledge Bases
  15. 15. Working effectively together NLP & Semantic Enrichment 15 • From unstructured to structured • NLP Analysis. POS Tagging • Named Entities Recognition • Linked Data • Entity Linking using Knowledge Bases • Disambiguation • Indexing in Solr
  16. 16. Working effectively together Smart Autocomplete 16 • Multi Phase suggestions • Closer to natural language query formulation • Named Entities infix • Entity types infix • Multi Language entity type support • Properties driven query approach
  17. 17. Working effectively together Smart Autocomplete Configuration 17 • Entity type properties •Interesting to our use case and scenario • Properties inheritance through type hierarchy • Enhance type information from external resource •Freebase, DbPedia , Custom Data Set
  18. 18. Working effectively together Semantic Search 18 • Search by Named Entity • Search by Entity Type • Search by Entity Type properties • Grouping Results by Sense • Contextualize Results Using Semantic Information
  19. 19. Working effectively together Semantic More Like This 19 • Search for Similar Documents based on Entities and Entities’ categories • Similarity Function based on Documents’ Sense • Not based on text tokens • Entity Frequency / Inverted Document Frequency • Entity Type Frequency / Inverted Document Frequency
  20. 20. Working effectively together Agenda 20 • Context • Problem • Solution • Demo • Future Works
  21. 21. Working effectively together Agenda 21 • Context • Problem • Solution • Demo • Future Works
  22. 22. Working effectively together Future Work 22 • Semantic More Like This new approach (Graph relations) • Machine Learning components: Classification, Topic annotation, Clustering • Semantic facets • Secured Entity Search • Image and Media searches
  23. 23. Working effectively together Conclusions 23 • Better user experience • More precision in search results • Closer to human language
  24. 24. Zaizi Headquarters Brook House 4th Floor, North Wing 229-243 Shepherd’s Bush Road London W6 7AN United Kingdom T: (+44) 20 3582 8330 Zaizi Iberia Calle Gremios 13-15, Edificio Diseño Planta 1, Oficina 5 41927 Mairena del Aljarafe Sevilla Spain T: (+34) 666 42 43 64 Zaizi Asia 50 Flower Road Colombo 07 Sri Lanka T: (+94) 112 301 461 Zaizi Singapore 14 Robinson Road #13-00 Far East Finance Building Singapore 048545 T: (+65) 3158 5886 F: (+65) 6323 1839 VAT Registration No GB 932 8855 89 Registered in England and Wales with registration number 6440931 www.zaizi.com Thanks!

×