ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search

ECIR 2014 Industry Day
Content Discovery Through Entity Driven Search
Alessandro Benedetti
http://uk.linkedin.com/in/alexbenedetti
Antonio David Perez Morales
http://es.linkedin.com/in/adperezmorales
16th
April 2014

• Experienced at building and delivering a wide range of enterprise
solutions across the whole information life cycle
• Alfresco & Ephesoft certified Platinum Partner
• Red Hat Enterprise Linux Ready Partner
• Crafter & Varnish Gold Partners
• Search Solutions Consultant
Alfresco Partner of the Year 2012 and
2013

Working effectively together
Who We Are
3
Antonio David Pérez Morales
- R&D Senior Engineer
- Master in Engineering and Technology
Software
- Digital Identity and Security expert
- Enterprise Search Background
- Semantic, NLP, ML Technologies and
Information Retrieval lover
- Apache Stanbol Committer
- Apache contributor
@adperezmorales
http://es.linkedin.com/in/adperezmorales/
Alessandro Benedetti
- R&D Senior Engineer
- Master in Computer Science
- Information Retrieval background
-- Enterprise Search specialist
- Semantic, NLP, ML Technologies
and Information Retrieval lover
@AlexBenedetti
http://uk.linkedin.com/in/alexbenedetti

Agenda
4
• Context
• Problem
• Solution
• Demo
• Future Works

Agenda
5
• Context
• Problem
• Solution
• Demo
• Future Works

Zaizi R&D Department
6
•Giving sense to the content
• Enriching it semantically
•Adding value to ECM/CMS
• More structured content, easy to manage, link and search,
•Improving search
• Across different domains, data sources, User Experience
• Machine Learning applied research
• Content Organization – Recommendation Systems

Agenda
7
• Context
• Problem
• Solution
• Demo
• Future Works

Enterprise Search Problems
8
Challenge :
Search within Big and Heterogeneus Repositories
• Heterogeneus Data Sources
• Filesystem, DB, ECM/CMS, Email, …
• Unstructured Content
• PDFs, text plain, Word, …
• Documents not linked between each other
• Federated Search needed
• Search across data sources
• Different permissions
• Centralized endpoint

Current Enterprise Search Weaknesses
9
• Keyword based
• Low precision
• Ambiguous terms not in context
• Not accurate weighting when keywords are combined
in a query

Agenda
10
• Context
• Problem
• Solution
• Demo
• Future Works

Entity Driven Search
11
• Moves from keywords to Entities
•More understandable to a Human
• Process the unstructured text
• Enrich it
• Build specific indexes
• Use entities and concepts in searches

Sensefy
12
• Semantic Enterprise Search Engine
• Federated Search
• Evolved User Experience
• Based on cutting-edge Open Source Frameworks

Architecture
13

RedLink
14
• Semantic Cloud platform
• Providing Software as a Service
• Manage unstructured data
• Extract knowledge and intelligence
• Make sense of information
• Feed into business processes
• Open-Source based components
• Entity Linking using Knowledge Bases

NLP & Semantic Enrichment
15
• From unstructured to structured
• NLP Analysis. POS Tagging
• Named Entities Recognition
• Linked Data
• Entity Linking using Knowledge Bases
• Disambiguation
• Indexing in Solr

Smart Autocomplete
16
• Multi Phase suggestions
• Closer to natural language query formulation
• Named Entities infix
• Entity types infix
• Multi Language entity type support
• Properties driven query approach

Smart Autocomplete
Configuration
17
• Entity type properties
•Interesting to our use case and scenario
• Properties inheritance through type hierarchy
• Enhance type information from external resource
•Freebase, DbPedia , Custom Data Set

Semantic Search
18
• Search by Named Entity
• Search by Entity Type
• Search by Entity Type properties
• Grouping Results by Sense
• Contextualize Results Using Semantic Information

Semantic More Like This
19
• Search for Similar Documents based on Entities and Entities’
categories
• Similarity Function based on Documents’ Sense
• Not based on text tokens
• Entity Frequency /
Inverted Document Frequency
• Entity Type Frequency /
Inverted Document Frequency

Agenda
20
• Context
• Problem
• Solution
• Demo
• Future Works

Agenda
21
• Context
• Problem
• Solution
• Demo
• Future Works

Future Work
22
• Semantic More Like This new approach (Graph
relations)
• Machine Learning components: Classification, Topic
annotation, Clustering
• Semantic facets
• Secured Entity Search
• Image and Media searches

Conclusions
23
• Better user experience
• More precision in search results
• Closer to human language

Zaizi Headquarters
Brook House
4th Floor, North Wing
229-243 Shepherd’s Bush Road
London W6 7AN
United Kingdom
T: (+44) 20 3582 8330
Zaizi Iberia
Calle Gremios 13-15, Edificio Diseño
Planta 1, Oficina 5
41927 Mairena del Aljarafe
Sevilla
Spain
T: (+34) 666 42 43 64
Zaizi Asia
50 Flower Road
Colombo 07
Sri Lanka
T: (+94) 112 301 461
Zaizi Singapore
14 Robinson Road #13-00
Far East Finance Building
Singapore 048545
T: (+65) 3158 5886
F: (+65) 6323 1839
VAT Registration No GB 932 8855 89
Registered in England and Wales with registration number 6440931
www.zaizi.com
Thanks!

ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (20)

Similar to ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search

Similar to ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search (20)

Recently uploaded

Recently uploaded (20)

ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search