ECIR 2014 Industry Day
Content Discovery Through Entity Driven Search
Alessandro Benedetti
http://uk.linkedin.com/in/alexbenedetti
Antonio David Perez Morales
http://es.linkedin.com/in/adperezmorales
16th
April 2014
• Experienced at building and delivering a wide range of enterprise
solutions across the whole information life cycle
• Alfresco & Ephesoft certified Platinum Partner
• Red Hat Enterprise Linux Ready Partner
• Crafter & Varnish Gold Partners
• Search Solutions Consultant
Alfresco Partner of the Year 2012 and
2013
Working effectively together
Who We Are
3
Antonio David Pérez Morales
- R&D Senior Engineer
- Master in Engineering and Technology
Software
- Digital Identity and Security expert
- Enterprise Search Background
- Semantic, NLP, ML Technologies and
Information Retrieval lover
- Apache Stanbol Committer
- Apache contributor
@adperezmorales
http://es.linkedin.com/in/adperezmorales/
Alessandro Benedetti
- R&D Senior Engineer
- Master in Computer Science
- Information Retrieval background
-- Enterprise Search specialist
- Semantic, NLP, ML Technologies
and Information Retrieval lover
@AlexBenedetti
http://uk.linkedin.com/in/alexbenedetti
Working effectively together
Agenda
4
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
5
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Zaizi R&D Department
6
•Giving sense to the content
• Enriching it semantically
•Adding value to ECM/CMS
• More structured content, easy to manage, link and search,
•Improving search
• Across different domains, data sources, User Experience
• Machine Learning applied research
• Content Organization – Recommendation Systems
Working effectively together
Agenda
7
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Enterprise Search Problems
8
Challenge :
Search within Big and Heterogeneus Repositories
• Heterogeneus Data Sources
• Filesystem, DB, ECM/CMS, Email, …
• Unstructured Content
• PDFs, text plain, Word, …
• Documents not linked between each other
• Federated Search needed
• Search across data sources
• Different permissions
• Centralized endpoint
Working effectively together
Current Enterprise Search Weaknesses
9
• Keyword based
• Low precision
• Ambiguous terms not in context
• Not accurate weighting when keywords are combined
in a query
Working effectively together
Agenda
10
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Entity Driven Search
11
• Moves from keywords to Entities
•More understandable to a Human
• Process the unstructured text
• Enrich it
• Build specific indexes
• Use entities and concepts in searches
Working effectively together
Sensefy
12
• Semantic Enterprise Search Engine
• Federated Search
• Evolved User Experience
• Based on cutting-edge Open Source Frameworks
Working effectively together
Architecture
13
Working effectively together
RedLink
14
• Semantic Cloud platform
• Providing Software as a Service
• Manage unstructured data
• Extract knowledge and intelligence
• Make sense of information
• Feed into business processes
• Open-Source based components
• Entity Linking using Knowledge Bases
Working effectively together
NLP & Semantic Enrichment
15
• From unstructured to structured
• NLP Analysis. POS Tagging
• Named Entities Recognition
• Linked Data
• Entity Linking using Knowledge Bases
• Disambiguation
• Indexing in Solr
Working effectively together
Smart Autocomplete
16
• Multi Phase suggestions
• Closer to natural language query formulation
• Named Entities infix
• Entity types infix
• Multi Language entity type support
• Properties driven query approach
Working effectively together
Smart Autocomplete
Configuration
17
• Entity type properties
•Interesting to our use case and scenario
• Properties inheritance through type hierarchy
• Enhance type information from external resource
•Freebase, DbPedia , Custom Data Set
Working effectively together
Semantic Search
18
• Search by Named Entity
• Search by Entity Type
• Search by Entity Type properties
• Grouping Results by Sense
• Contextualize Results Using Semantic Information
Working effectively together
Semantic More Like This
19
• Search for Similar Documents based on Entities and Entities’
categories
• Similarity Function based on Documents’ Sense
• Not based on text tokens
• Entity Frequency /
Inverted Document Frequency
• Entity Type Frequency /
Inverted Document Frequency
Working effectively together
Agenda
20
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
21
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Future Work
22
• Semantic More Like This new approach (Graph
relations)
• Machine Learning components: Classification, Topic
annotation, Clustering
• Semantic facets
• Secured Entity Search
• Image and Media searches
Working effectively together
Conclusions
23
• Better user experience
• More precision in search results
• Closer to human language
Zaizi Headquarters
Brook House
4th Floor, North Wing
229-243 Shepherd’s Bush Road
London W6 7AN
United Kingdom
T: (+44) 20 3582 8330
Zaizi Iberia
Calle Gremios 13-15, Edificio Diseño
Planta 1, Oficina 5
41927 Mairena del Aljarafe
Sevilla
Spain
T: (+34) 666 42 43 64
Zaizi Asia
50 Flower Road
Colombo 07
Sri Lanka
T: (+94) 112 301 461
Zaizi Singapore
14 Robinson Road #13-00
Far East Finance Building
Singapore 048545
T: (+65) 3158 5886
F: (+65) 6323 1839
VAT Registration No GB 932 8855 89
Registered in England and Wales with registration number 6440931
www.zaizi.com
Thanks!

Content Discovery Through Entity Driven Search

  • 1.
    ECIR 2014 IndustryDay Content Discovery Through Entity Driven Search Alessandro Benedetti http://uk.linkedin.com/in/alexbenedetti Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales 16th April 2014
  • 2.
    • Experienced atbuilding and delivering a wide range of enterprise solutions across the whole information life cycle • Alfresco & Ephesoft certified Platinum Partner • Red Hat Enterprise Linux Ready Partner • Crafter & Varnish Gold Partners • Search Solutions Consultant Alfresco Partner of the Year 2012 and 2013
  • 3.
    Working effectively together WhoWe Are 3 Antonio David Pérez Morales - R&D Senior Engineer - Master in Engineering and Technology Software - Digital Identity and Security expert - Enterprise Search Background - Semantic, NLP, ML Technologies and Information Retrieval lover - Apache Stanbol Committer - Apache contributor @adperezmorales http://es.linkedin.com/in/adperezmorales/ Alessandro Benedetti - R&D Senior Engineer - Master in Computer Science - Information Retrieval background -- Enterprise Search specialist - Semantic, NLP, ML Technologies and Information Retrieval lover @AlexBenedetti http://uk.linkedin.com/in/alexbenedetti
  • 4.
    Working effectively together Agenda 4 •Context • Problem • Solution • Demo • Future Works
  • 5.
    Working effectively together Agenda 5 •Context • Problem • Solution • Demo • Future Works
  • 6.
    Working effectively together ZaiziR&D Department 6 •Giving sense to the content • Enriching it semantically •Adding value to ECM/CMS • More structured content, easy to manage, link and search, •Improving search • Across different domains, data sources, User Experience • Machine Learning applied research • Content Organization – Recommendation Systems
  • 7.
    Working effectively together Agenda 7 •Context • Problem • Solution • Demo • Future Works
  • 8.
    Working effectively together EnterpriseSearch Problems 8 Challenge : Search within Big and Heterogeneus Repositories • Heterogeneus Data Sources • Filesystem, DB, ECM/CMS, Email, … • Unstructured Content • PDFs, text plain, Word, … • Documents not linked between each other • Federated Search needed • Search across data sources • Different permissions • Centralized endpoint
  • 9.
    Working effectively together CurrentEnterprise Search Weaknesses 9 • Keyword based • Low precision • Ambiguous terms not in context • Not accurate weighting when keywords are combined in a query
  • 10.
    Working effectively together Agenda 10 •Context • Problem • Solution • Demo • Future Works
  • 11.
    Working effectively together EntityDriven Search 11 • Moves from keywords to Entities •More understandable to a Human • Process the unstructured text • Enrich it • Build specific indexes • Use entities and concepts in searches
  • 12.
    Working effectively together Sensefy 12 •Semantic Enterprise Search Engine • Federated Search • Evolved User Experience • Based on cutting-edge Open Source Frameworks
  • 13.
  • 14.
    Working effectively together RedLink 14 •Semantic Cloud platform • Providing Software as a Service • Manage unstructured data • Extract knowledge and intelligence • Make sense of information • Feed into business processes • Open-Source based components • Entity Linking using Knowledge Bases
  • 15.
    Working effectively together NLP& Semantic Enrichment 15 • From unstructured to structured • NLP Analysis. POS Tagging • Named Entities Recognition • Linked Data • Entity Linking using Knowledge Bases • Disambiguation • Indexing in Solr
  • 16.
    Working effectively together SmartAutocomplete 16 • Multi Phase suggestions • Closer to natural language query formulation • Named Entities infix • Entity types infix • Multi Language entity type support • Properties driven query approach
  • 17.
    Working effectively together SmartAutocomplete Configuration 17 • Entity type properties •Interesting to our use case and scenario • Properties inheritance through type hierarchy • Enhance type information from external resource •Freebase, DbPedia , Custom Data Set
  • 18.
    Working effectively together SemanticSearch 18 • Search by Named Entity • Search by Entity Type • Search by Entity Type properties • Grouping Results by Sense • Contextualize Results Using Semantic Information
  • 19.
    Working effectively together SemanticMore Like This 19 • Search for Similar Documents based on Entities and Entities’ categories • Similarity Function based on Documents’ Sense • Not based on text tokens • Entity Frequency / Inverted Document Frequency • Entity Type Frequency / Inverted Document Frequency
  • 20.
    Working effectively together Agenda 20 •Context • Problem • Solution • Demo • Future Works
  • 21.
    Working effectively together Agenda 21 •Context • Problem • Solution • Demo • Future Works
  • 22.
    Working effectively together FutureWork 22 • Semantic More Like This new approach (Graph relations) • Machine Learning components: Classification, Topic annotation, Clustering • Semantic facets • Secured Entity Search • Image and Media searches
  • 23.
    Working effectively together Conclusions 23 •Better user experience • More precision in search results • Closer to human language
  • 24.
    Zaizi Headquarters Brook House 4thFloor, North Wing 229-243 Shepherd’s Bush Road London W6 7AN United Kingdom T: (+44) 20 3582 8330 Zaizi Iberia Calle Gremios 13-15, Edificio Diseño Planta 1, Oficina 5 41927 Mairena del Aljarafe Sevilla Spain T: (+34) 666 42 43 64 Zaizi Asia 50 Flower Road Colombo 07 Sri Lanka T: (+94) 112 301 461 Zaizi Singapore 14 Robinson Road #13-00 Far East Finance Building Singapore 048545 T: (+65) 3158 5886 F: (+65) 6323 1839 VAT Registration No GB 932 8855 89 Registered in England and Wales with registration number 6440931 www.zaizi.com Thanks!