SlideShare a Scribd company logo
1 of 19
EMBASE:
A Technical Introduction
from a
Search Engineer’s
perspective
Junte Zhang
A Typical Search Engine
Set-based
Algebraic
Probalistic
Feature based
Basic Building Blocks
A systematic approach
to information retrieval.
Source: Lalmas et al.
2001, fig. 2.
Query Refinement
• Facets and filtering
• Suggestions and autocomplete
Collecting and Saving
Results
• In EMBASE:
• Archiving
• Analytics
• Other domains also:
• Bookmarking
• Checkout and buy
EMBASE Architecture
Indexing with an Inverted Index
Documents, Data Model, Indexing
XML Docs
(OpsBank,
DWH)
Fabrication
Document enrichment
- Emtree Backposting
- Field updates
Document transformation
Kafka
Data Model in POJO XML to POJO to JSON to XML
Pre-processing of content, i.e.
combining fields and meta-
fields
Cleaning
Document LoaderDocument Feeder
Elastic
search
AWS
S3
Elasticsearch topology
Node
Index
Shard Shard
Index
Shard
• 3 master nodes
• 18 data nodes
• ~700 GB index with 18 shards
• Shard size of ~40 GB
• Running in Docker containers on Kubernetes
Query Processing
• See:
https://confluence.elsevier.com/display/EM/Embase+command+lang
uage+grammar+visualization
Query
Query Parser
Query Builders
Lexer and AST with Antlr Semantic Tree Parser
Elasticsearch
Lucene
Query Language (grammar rules)
Matching
• Tokenization by whitespace
• Exact matching and on compounded terms
• 'aminefunctionalized’:ti equals 'amine-functionalized':ti
• Removal of punctuation, but allow searching for special characters and
sub/superscripts
• ASCII folding and lowercasing
• No language processing (stemming/stopwords)
Ranking
• By Publication Year and Entry Date
• By Relevance
• Default BM25 relevance scoring of Elasticsearch (probalistic model)
• Similarity search
• Vector Space Model with term boosting
Query Refinement:
Autocomplete
• Lookup of cached (grouped) Emtree terms
• Hit counts
• Live
• Cached with a cronjob that computes the
hitcounts from ES with terms aggregations and
partitions
Query Refinement: Synonyms
Emtree thesaurus:
<Term>
<TermName LinkType="drug">water</TermName>
<Synonym>dihydrogen oxide</Synonym>
<Synonym>hydrogen oxide</Synonym>
<Synonym>hydrogen oxide o 16</Synonym>
<Synonym>reclaimed water</Synonym>
<Synonym>washing water</Synonym>
<Synonym>water o 16</Synonym>
<HistoryNote>
<CreationYear>1974</CreationYear>
</HistoryNote>
</Term>
Dorlands dictionary:
<entry disabled="false">
<term>metaiodobenzylguanidine</term>
<emtreeTerm>(3 iodobenzyl)guanidine</emtreeTerm>
<definition>iobenguane.</definition>
</entry>
Query Refinement: Faceting (1)
• Using multiple dimensions to narrow down on
results.
• “…allowing users to narrow down search
results by applying multiple filters based
on faceted classification of the items.”
• https://en.wikipedia.org/wiki/Facete
d_search
• EMBASE uses Elasticsearch aggregations for
creating facets
• “The aggregations framework helps
provide aggregated data based on a search
query. It is based on simple building blocks
called aggregations, that can be composed
in order to build complex summaries of the
data.”
• https://www.elastic.co/guide/en/ela
sticsearch/reference/current/search-
aggregations.html
Query Refinement: Faceting (2)
• Plain faceting
• Hierarchical Faceting with Subheadings and
Triplelinks
• Faceting with Venn diagrams
• Facets with name normalization
• We use Elasticsearch Aggregations
• (Terms, Nested, Reverse Nested, Adjacency Matric,
Filter)
Exporting
• We cannot use ES pagination for retrieving large amounts of results
• To retrieve large amounts of results:
• ES Scroll API and search_after parameter
Summary
Overview of EMBASE from a search
engineering perspective
Explained how EMBASE does:
Indexing
Query processing and building
Matching and ranking
Query refinement with autocomplete,
synonyms, faceting and filtering
Exporting

More Related Content

Similar to Search Engineering in EMBASE

From SQL to MongoDB
From SQL to MongoDBFrom SQL to MongoDB
From SQL to MongoDBNuxeo
 
Speed Up Your APEX Apps with JSON and Handlebars
Speed Up Your APEX Apps with JSON and HandlebarsSpeed Up Your APEX Apps with JSON and Handlebars
Speed Up Your APEX Apps with JSON and HandlebarsMarko Gorički
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLucidworks
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceDavid Hoerster
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 
Upcoming JDeveloper ADF Business Components REST support
Upcoming JDeveloper ADF Business Components REST supportUpcoming JDeveloper ADF Business Components REST support
Upcoming JDeveloper ADF Business Components REST supportSteven Davelaar
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentationOleksii Usyk
 
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.GeeksLab Odessa
 
mongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxmongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxRoopaR36
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMinsk MongoDB User Group
 
Electron, databases, and RxDB
Electron, databases, and RxDBElectron, databases, and RxDB
Electron, databases, and RxDBBen Gotow
 
Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"
Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"
Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"Fwdays
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...Peter Keane
 

Similar to Search Engineering in EMBASE (20)

From SQL to MongoDB
From SQL to MongoDBFrom SQL to MongoDB
From SQL to MongoDB
 
Speed Up Your APEX Apps with JSON and Handlebars
Speed Up Your APEX Apps with JSON and HandlebarsSpeed Up Your APEX Apps with JSON and Handlebars
Speed Up Your APEX Apps with JSON and Handlebars
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
 
Oz search
Oz search Oz search
Oz search
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 
Upcoming JDeveloper ADF Business Components REST support
Upcoming JDeveloper ADF Business Components REST supportUpcoming JDeveloper ADF Business Components REST support
Upcoming JDeveloper ADF Business Components REST support
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
 
MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
 
mongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxmongodb11 (1) (1).pptx
mongodb11 (1) (1).pptx
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
Electron, databases, and RxDB
Electron, databases, and RxDBElectron, databases, and RxDB
Electron, databases, and RxDB
 
Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"
Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"
Денис Резник "Зачем мне знать SQL и Базы Данных, ведь у меня есть ORM?"
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
 
ora_sothea
ora_sotheaora_sothea
ora_sothea
 
Look Ma! No more blobs
Look Ma! No more blobsLook Ma! No more blobs
Look Ma! No more blobs
 

Recently uploaded

Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConNatan Silnitsky
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Flutter Agency
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMarkus Moeller
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024MulesoftMunichMeetup
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit MilanNeo4j
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)Roberto Bettazzoni
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...drm1699
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNeo4j
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringPrakhyath Rai
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypseTomasz Kowalczewski
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jNeo4j
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfSrushith Repakula
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insightsrapidoform
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 

Recently uploaded (20)

Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
Anypoint Code Builder - Munich MuleSoft Meetup - 16th May 2024
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 

Search Engineering in EMBASE

  • 1. EMBASE: A Technical Introduction from a Search Engineer’s perspective Junte Zhang
  • 2. A Typical Search Engine Set-based Algebraic Probalistic Feature based
  • 3. Basic Building Blocks A systematic approach to information retrieval. Source: Lalmas et al. 2001, fig. 2.
  • 4. Query Refinement • Facets and filtering • Suggestions and autocomplete
  • 5. Collecting and Saving Results • In EMBASE: • Archiving • Analytics • Other domains also: • Bookmarking • Checkout and buy
  • 7. Indexing with an Inverted Index
  • 8. Documents, Data Model, Indexing XML Docs (OpsBank, DWH) Fabrication Document enrichment - Emtree Backposting - Field updates Document transformation Kafka Data Model in POJO XML to POJO to JSON to XML Pre-processing of content, i.e. combining fields and meta- fields Cleaning Document LoaderDocument Feeder Elastic search AWS S3
  • 9. Elasticsearch topology Node Index Shard Shard Index Shard • 3 master nodes • 18 data nodes • ~700 GB index with 18 shards • Shard size of ~40 GB • Running in Docker containers on Kubernetes
  • 10. Query Processing • See: https://confluence.elsevier.com/display/EM/Embase+command+lang uage+grammar+visualization Query Query Parser Query Builders Lexer and AST with Antlr Semantic Tree Parser Elasticsearch Lucene
  • 12. Matching • Tokenization by whitespace • Exact matching and on compounded terms • 'aminefunctionalized’:ti equals 'amine-functionalized':ti • Removal of punctuation, but allow searching for special characters and sub/superscripts • ASCII folding and lowercasing • No language processing (stemming/stopwords)
  • 13. Ranking • By Publication Year and Entry Date • By Relevance • Default BM25 relevance scoring of Elasticsearch (probalistic model) • Similarity search • Vector Space Model with term boosting
  • 14. Query Refinement: Autocomplete • Lookup of cached (grouped) Emtree terms • Hit counts • Live • Cached with a cronjob that computes the hitcounts from ES with terms aggregations and partitions
  • 15. Query Refinement: Synonyms Emtree thesaurus: <Term> <TermName LinkType="drug">water</TermName> <Synonym>dihydrogen oxide</Synonym> <Synonym>hydrogen oxide</Synonym> <Synonym>hydrogen oxide o 16</Synonym> <Synonym>reclaimed water</Synonym> <Synonym>washing water</Synonym> <Synonym>water o 16</Synonym> <HistoryNote> <CreationYear>1974</CreationYear> </HistoryNote> </Term> Dorlands dictionary: <entry disabled="false"> <term>metaiodobenzylguanidine</term> <emtreeTerm>(3 iodobenzyl)guanidine</emtreeTerm> <definition>iobenguane.</definition> </entry>
  • 16. Query Refinement: Faceting (1) • Using multiple dimensions to narrow down on results. • “…allowing users to narrow down search results by applying multiple filters based on faceted classification of the items.” • https://en.wikipedia.org/wiki/Facete d_search • EMBASE uses Elasticsearch aggregations for creating facets • “The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.” • https://www.elastic.co/guide/en/ela sticsearch/reference/current/search- aggregations.html
  • 17. Query Refinement: Faceting (2) • Plain faceting • Hierarchical Faceting with Subheadings and Triplelinks • Faceting with Venn diagrams • Facets with name normalization • We use Elasticsearch Aggregations • (Terms, Nested, Reverse Nested, Adjacency Matric, Filter)
  • 18. Exporting • We cannot use ES pagination for retrieving large amounts of results • To retrieve large amounts of results: • ES Scroll API and search_after parameter
  • 19. Summary Overview of EMBASE from a search engineering perspective Explained how EMBASE does: Indexing Query processing and building Matching and ranking Query refinement with autocomplete, synonyms, faceting and filtering Exporting