SlideShare a Scribd company logo
August 13 2020
Sujit Pal, Elsevier Labs
Question Answering
as Search
The Anserini Pipeline and other stories
THE SEARCH RELEVANCE CONFERENCE
About Me
• Work at Elsevier Labs
• (Mostly self-taught) data scientist
• Ex-search guy, Lucene and Solr mainly
• Some NLP, traditional ML and Deep Learning,
some Computer Vision
• Started looking at Question Answering in 2019
• Specifically the BERTserini project from Jimmy
Lin’s lab.
2
Agenda
• Types of QA systems
• BERTSerini Pipeline
• Experiments and Results
3
Types of QA Systems
We will just cover the subset where the objective, given a question, is to get answer spans
from passages in a text corpus.
4
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
5
• Jurafsky and Martin, IBM Watson, YodaQA
• Choose keywords from question
• Predict Question type (who, what, when, …)
• Rank passage by answer type, question words
• Extract answer based on pattern matching and question type
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
6
• DrQA (2017), BERTserini (2019)
• Retriever is unsupervised
• Reader is supervised Reading Comprehension model
Reading Wikipedia to answer Open Domain Questions (Chen, et al, 2017)
End-to-end Open Domain Question Answering with BERTserini (Yang, et al, 2019)
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
7
• ORQA (2019), REALM (2020)
• Train retriever and reader end-to-end using question answer pairs.
• Answer ranked by vector similarity between learned embeddings (question and answer).
Latent Retrieval for Weakly Supervised Open Domain Question Answering (Lee, et al, 2019)
Retrieval Augmented Language Model Pre-training (Guu, et al, 2020)
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
8
• GPT-2, GPT-3, T5 (2019 - 2020)
• Fine tuned Language Model
• No corpus, LM stores world knowledge implicitly
Exploring the limits of Transfer Learning with a Unified Text-to-text Transformer (Raffel, et al, 2019)
The BERTSerini Pipeline
And how and why we adapted it for our needs
9
BERTserini Pipeline
10
Anserini + BERT = BERTserini
BERTserini Pipeline
11
SOTA Results!
How would these results
translate IRL?
Our BERTserini Pipeline
12
ScienceDirect
and later
ClinicalKey
Solr +
plugin
Best results with SciBERT/SQuAD 1.1
BERT Reader changes
• Switched out BERT-base model fine-tuned with SQuAD 1.1 with SciBERT
model fine-tuned with SQuAD v 1.1.
• Also tried…
− Fine tuning other BERT models – BERT-large, BioBERT.
− Fine tuning using SQuAD v 2.0 dataset
− Additional Pre-training model using Clinical Key content
13
Anserini Retriever changes
• Switched out Lucene index with Solr
based index.
• Moved batch oriented Anserini
functionality to Solr plugin for
interactive use.
− Open source, available in github:
https://github.com/elsevierlabs-
os/anserini-solr-plugin
− Code could be cleaner, but developed
for use in POC code.
14
anserini-solr-plugin
15
• Input: HTTP GET request specify
query, sim, qtype, and rtype.
• Similarity (sim): query likelihood
(ql) and BM25 (bm25, default).
• Query Rewriting (qtype)
• Bag of Words (bow), Sequential
Dependency Model (sdm)
• Added edismax and raw
• Result Reranking (rtype)
• RM3 (rm3)
• Axiomatic (ax)
• Identity (no reranking)
• Added external (delegate to
external rerank service)
• Output: HTTP Response
Rewritten query
Reranking query
https://github.com/elsevierlabs-os/anserini-solr-plugin
Experiments and Results
Creating the MedSQuAD dataset, and replacing the Anserini reranker component with
various candidates
16
Initial Setup
• Index paragraphs from ClinicalKey books
• Use BM25 + BOW + RM3
• Scoring:
• Use k=1, look at top answer only
• Scoring metric EM (exact match) and F1 (f1-score)
between label and predicted answers.
17
Paper says paragraphs and
these settings work best
We hope to use the top answer
for display without further post-
processing
SQuAD metrics
How well does BERTserini work on our data?
• 100 questions from nursing text,
classified as “Remembering” in
Bloom’s taxonomy.
• Run these questions against
pipeline and manually inspect
each answer.
• ~ 60 “reasonable” answers.
− Answer span correct, but…
− Passage answers question
18
What causes a condition known as black hairy
tongue?
Hairy tongue is a condition in which the patient has an
increased accumulation of keratin on the filiform papillae
that results in a white, “hairy” appearance. This may be
the result of either an increase in keratin production or a
decrease in normal desquamation. Unless otherwise
pigmented, the elongated filiform papillae are white (
Fig. 1.58). In the condition known as black hairy tongue,
the papillae are a brown-to-black color because
of chromogenic bacteria (Fig. 1.59). Tobacco and certain
foods may also discolor the papillae. Although the cause
is unknown, hydrogen peroxide, bismuth subsalicylates
for upset stomach, alcohol, or chemical rinses have
been suggested to stimulate the elongation of the filiform
papillae that results in the appearance of hairy tongue.
Oral Pathology for the Dental Hygienist: Introduction to Preliminary Diagnosis of
Oral Lesions (PII: B9780323400626000013, ISBN: 978-0-323-40062-6)
Some more good results
What is a cause of tooth mobility?
Periodontal probing is used to assess attachment
levels to the tooth and is a prime indicator of
health. Radiographic bone loss around a tooth
does not indicate the presence of a disease state
but is a reflection of past or present periodontal
disease. Occlusal trauma may cause an increase
in tooth mobility but does not cause marginal bone
loss in the absence of periodontal disease.
Contemporary Implant Dentistry: An Implant Is Not a Tooth: A Comparison
of Periodontal Indices (PII: B9780323043731500484, ISBN: 978-0-323-
04373-1)
19
What is the cause of stridor?
Stridor is a term used to describe a high-pitched sound
caused by partial obstruction of the airway. Stridor can
have an inspiratory, expiratory, or biphasic pattern (both
inspiratory and expiratory). An inspiratory pattern
suggests an upper airway cause (e.g., epiglottitis). An
expiratory pattern suggests a lower airway etiology
(e.g., tracheomalacia). A biphasic pattern suggests a
glottic or subglottic obstruction (e.g., subglottic
hemangioma). Imaging evaluation of the child with
stridor is commonly performed with neck and/or chest
radiographs, depending on the pattern of stridor and
associated clinical findings.
Emergency Radiology: The Requisites: Imaging Evaluation of Common Pediatric
Emergencies (PII: B9780323376402000066, ISBN: 978-0-323-37640-2)
As well as some fails
What special considerations must be
observed when a patient has epiglottitis?
What special considerations related to
her transplant need to be in place for this
patient during critical care resuscitation?
Advanced Critical Care Nursing: Bone Marrow Transplantation (PII:
B9781416032199100397, ISBN: 978-1-4160-3219-9)
20
What conditions are treated by methotrexate?
The combination of PUVA and methotrexate
successfully treated five patients
with erythrodermic psoriasis and two with
pustular psoriasis. According to the authors,
annual methotrexate doses could be reduced by
50% by adding PUVA to the regimen.
Treatment of Skin Disease: Comprehensive Therapeutic Strategies:
Psoriasis (PII: B978070206912300210X, ISBN: 978-0-7020-6912-3)Meaningless answer
Surely a better answer exists?
Reader Experiments
• Results of evaluating various Reader configurations (no Retriever)
against SQuAD dataset.
• Encouraging results for reading comprehension task, i.e., when
appropriate passage is provided.
21
Parameters EM F1
BERT-base uncased + SQuAD 1.1 75.86 82.41
BERT-base uncased + SQuAD 2.0 74.03 77.30
SciBERT + SQuAD 1.1 79.10 87.26
Human (SQuAD v2) 86.83 89.45
MedSQuAD dataset
• SQuAD contains (question, passage, answer)
triples.
− Task is Reading Comprehension, i.e., find the
most appropriate span in the passage to return
as an answer to the question.
• Nursing content = (question, answer) pairs.
• MedSQUAD dataset
− Good answers from nursing questions + top
passages, select best passage manually
− Passages in ClinicalKey + automatic question
generation, select triples manually
− Approximately 300 (question, passage, answer)
triples.
22
Retriever Experiments
• Adding default retriever backend
− Parse the question into appropriate query (BM25 + BoW worked best)
− Rerank (RM3 worked best) and return top 50 result passages
− Reader generates answer using each of the top 50 passages
− Returns the top (k=1) answer by segment and span score
• Scores drop by 40+ points!
23
Reader not getting
the “right”
passages?
Parameters EM F1
Baseline (no retriever) 65.11 76.03
Anserini retriever (BM25+Bow+RM3, 50 results, k=1) 23.02 30.50
Passage reranking? (Nogueira
and Cho, 2020)
Passage Reranking with BERT (Nogueira and Cho, 2020)
BERT Based Reranker (Unsupervised)
• BERT-as-a-service (BaaS) wraps a BERT-base-uncased model and
returns embeddings from its last layer [-1].
• Query embeddings produced from query using BaaS.
• Passage embeddings produced for top 50 passages returned by query
using BaaS.
• Cosine similarity computed between query vector and passage vectors
and passages reranked by similarity descending.
24
Parameters EM F1
BM25+BoW+RM3, 50 records 8.27 11.45
Query Sentence Relevance Reranker
• Model predicts relevance (0/1) between query and single sentence.
• Trained on TREC Microblog dataset (120,000 query sentence pairs)
• Classifier fine-tuned from BERT-base-uncased for 2 epochs, Adam
optimizer with learning rate 2e-5, and batch size 32, F1-score: 0.86.
• Passage is scored as #-relevant sentences / #-number of sentences and
reranked by score descending.
25
Parameters EM F1
BM25+BoW+RM3, 100 records 13.35 19.19
Passage Relevance Reranker
• Model predicts relevance (number between 0 and
1) between passage and question.
• Trained using SQuAD 1.1 (passage, question) pairs
with negative sampling.
• Regression model fine-tuned from BERT-base-
uncased for 2 epochs, batch size 8, Adam
optimizer with learning rate 2e-5, RMSE 0.3.
• Passage is ranked by relevance score descending.
26
Parameters EM F1
BM25+BoW+RM3, 50 records 8.99 15.69
Siamese BERT Reranker
• Pretrained models from https://github.com/UKPLab/sentence-transformers
to produce embeddings from question and passage text.
• Passage score is mean or maximum similarity between question vector
and all sentences in passage.
• Passages ranked by score descending.
27
Parameters EM F1
BM25+BoW+RM3, 50 records, max similarity,
model: bert_base_nli_mean_tokens.
16.91 24.95
Reranker Scores (all together)
28
Parameters EM F1
BERTserini (Paragraph, k=100) 38.6 46.1
BERTserini (Paragraph, k=29) 36.6 44.0
BERTserini (Article, k=5) 19.1 25.9
Anserini (BM25+BoW+RM3, SciBERT, MedSQuAD,
Paragraph, k=1) (baseline)
23.02 30.5
RM3 replaced with BERT reranker 8.27 11.45
RM3 replaced with Query Sentence Relevance reranker 13.35 19.19
RM3 replaced with Passage Relevance reranker 8.99 15.69
RM3 replaced with Siamese BERT reranker 16.91 24.95
From paper
Our results
Conclusions
• We couldn’t beat Anserini with any of our Passage Rerankers.
• Our results are comparable (paragraph, k=1) to those reported in the
paper.
• However, response time is unacceptable because of slow Reader
component (techniques such as model distillation might help somewhat).
• Quality of answer snippet at k=1 not always acceptable, and too risky for
production use.
• We also want to use the selected passage and the source metadata as
additional provenance for the answer
29
Thank you
I am @sujitpal on relevancy.slack.com if you have
questions

More Related Content

What's hot

A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
SaiPragnaKancheti
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
Jens Lehmann
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
Loic Merckel
 
Financial Question Answering with BERT Language Models
Financial Question Answering with BERT Language ModelsFinancial Question Answering with BERT Language Models
Financial Question Answering with BERT Language Models
Bithiah Yuan
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
Vikrant Arya
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
Sujit Pal
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
Loic Merckel
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
Hady Elsahar
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
Bhaskar Mitra
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
Sara Hooker
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
rahul_net
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
Leon Dohmen
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
Data Science Dojo
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
Xavier Amatriain
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
Marina Santini
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
Wayne Lee
 
Introduction to Computer theory Daniel Cohen Chapter 4 & 5 Solutions
Introduction to Computer theory Daniel Cohen Chapter 4 & 5 SolutionsIntroduction to Computer theory Daniel Cohen Chapter 4 & 5 Solutions
Introduction to Computer theory Daniel Cohen Chapter 4 & 5 Solutions
Ashu
 

What's hot (20)

A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
Financial Question Answering with BERT Language Models
Financial Question Answering with BERT Language ModelsFinancial Question Answering with BERT Language Models
Financial Question Answering with BERT Language Models
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 
Introduction to Computer theory Daniel Cohen Chapter 4 & 5 Solutions
Introduction to Computer theory Daniel Cohen Chapter 4 & 5 SolutionsIntroduction to Computer theory Daniel Cohen Chapter 4 & 5 Solutions
Introduction to Computer theory Daniel Cohen Chapter 4 & 5 Solutions
 

Similar to Question Answering as Search - the Anserini Pipeline and Other Stories

LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
DIPESH30
 
44publicspkeaking06
44publicspkeaking0644publicspkeaking06
44publicspkeaking06
emailtuanh
 
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth ContextRuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML
 
Blueprinting and drafting questions liz norman anzcvs 2015
Blueprinting and drafting questions liz norman anzcvs 2015Blueprinting and drafting questions liz norman anzcvs 2015
Blueprinting and drafting questions liz norman anzcvs 2015
Liz Norman
 
Weakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using SnorkelWeakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using Snorkel
Anjani Dhrangadhariya
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
diannepatricia
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
Alejandro Bellogin
 
Liz Norman Examination and moderation guidelines
Liz Norman   Examination and moderation guidelinesLiz Norman   Examination and moderation guidelines
Liz Norman Examination and moderation guidelines
Liz Norman
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
Josh Patterson
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
Joaquin Vanschoren
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence Intervals
Tetsuya Sakai
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
Elena Sügis
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
atul404633
 
Publish or perish
Publish or perishPublish or perish
Publish or perish
Professor M. A. Imam
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodologysmumbahelp
 
chapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for collegechapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for college
joygalero
 

Similar to Question Answering as Search - the Anserini Pipeline and Other Stories (20)

LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
 
44publicspkeaking06
44publicspkeaking0644publicspkeaking06
44publicspkeaking06
 
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth ContextRuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
 
Umap v1
Umap v1Umap v1
Umap v1
 
Blueprinting and drafting questions liz norman anzcvs 2015
Blueprinting and drafting questions liz norman anzcvs 2015Blueprinting and drafting questions liz norman anzcvs 2015
Blueprinting and drafting questions liz norman anzcvs 2015
 
Weakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using SnorkelWeakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using Snorkel
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Liz Norman Examination and moderation guidelines
Liz Norman   Examination and moderation guidelinesLiz Norman   Examination and moderation guidelines
Liz Norman Examination and moderation guidelines
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence Intervals
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Publish or perish
Publish or perishPublish or perish
Publish or perish
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
chapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for collegechapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for college
 

More from Sujit Pal

Supporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge GraphSupporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge Graph
Sujit Pal
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Sujit Pal
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
Sujit Pal
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
Sujit Pal
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
Sujit Pal
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Sujit Pal
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
Sujit Pal
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop Visualization
Sujit Pal
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
Sujit Pal
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Sujit Pal
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal Club
Sujit Pal
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
Sujit Pal
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
Sujit Pal
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
Sujit Pal
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
Sujit Pal
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
Sujit Pal
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
Sujit Pal
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
Sujit Pal
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
Sujit Pal
 

More from Sujit Pal (20)

Supporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge GraphSupporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge Graph
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop Visualization
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal Club
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 

Recently uploaded

Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 

Recently uploaded (20)

Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 

Question Answering as Search - the Anserini Pipeline and Other Stories

  • 1. August 13 2020 Sujit Pal, Elsevier Labs Question Answering as Search The Anserini Pipeline and other stories THE SEARCH RELEVANCE CONFERENCE
  • 2. About Me • Work at Elsevier Labs • (Mostly self-taught) data scientist • Ex-search guy, Lucene and Solr mainly • Some NLP, traditional ML and Deep Learning, some Computer Vision • Started looking at Question Answering in 2019 • Specifically the BERTserini project from Jimmy Lin’s lab. 2
  • 3. Agenda • Types of QA systems • BERTSerini Pipeline • Experiments and Results 3
  • 4. Types of QA Systems We will just cover the subset where the objective, given a question, is to get answer spans from passages in a text corpus. 4
  • 5. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 5 • Jurafsky and Martin, IBM Watson, YodaQA • Choose keywords from question • Predict Question type (who, what, when, …) • Rank passage by answer type, question words • Extract answer based on pattern matching and question type
  • 6. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 6 • DrQA (2017), BERTserini (2019) • Retriever is unsupervised • Reader is supervised Reading Comprehension model Reading Wikipedia to answer Open Domain Questions (Chen, et al, 2017) End-to-end Open Domain Question Answering with BERTserini (Yang, et al, 2019)
  • 7. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 7 • ORQA (2019), REALM (2020) • Train retriever and reader end-to-end using question answer pairs. • Answer ranked by vector similarity between learned embeddings (question and answer). Latent Retrieval for Weakly Supervised Open Domain Question Answering (Lee, et al, 2019) Retrieval Augmented Language Model Pre-training (Guu, et al, 2020)
  • 8. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 8 • GPT-2, GPT-3, T5 (2019 - 2020) • Fine tuned Language Model • No corpus, LM stores world knowledge implicitly Exploring the limits of Transfer Learning with a Unified Text-to-text Transformer (Raffel, et al, 2019)
  • 9. The BERTSerini Pipeline And how and why we adapted it for our needs 9
  • 11. BERTserini Pipeline 11 SOTA Results! How would these results translate IRL?
  • 12. Our BERTserini Pipeline 12 ScienceDirect and later ClinicalKey Solr + plugin Best results with SciBERT/SQuAD 1.1
  • 13. BERT Reader changes • Switched out BERT-base model fine-tuned with SQuAD 1.1 with SciBERT model fine-tuned with SQuAD v 1.1. • Also tried… − Fine tuning other BERT models – BERT-large, BioBERT. − Fine tuning using SQuAD v 2.0 dataset − Additional Pre-training model using Clinical Key content 13
  • 14. Anserini Retriever changes • Switched out Lucene index with Solr based index. • Moved batch oriented Anserini functionality to Solr plugin for interactive use. − Open source, available in github: https://github.com/elsevierlabs- os/anserini-solr-plugin − Code could be cleaner, but developed for use in POC code. 14
  • 15. anserini-solr-plugin 15 • Input: HTTP GET request specify query, sim, qtype, and rtype. • Similarity (sim): query likelihood (ql) and BM25 (bm25, default). • Query Rewriting (qtype) • Bag of Words (bow), Sequential Dependency Model (sdm) • Added edismax and raw • Result Reranking (rtype) • RM3 (rm3) • Axiomatic (ax) • Identity (no reranking) • Added external (delegate to external rerank service) • Output: HTTP Response Rewritten query Reranking query https://github.com/elsevierlabs-os/anserini-solr-plugin
  • 16. Experiments and Results Creating the MedSQuAD dataset, and replacing the Anserini reranker component with various candidates 16
  • 17. Initial Setup • Index paragraphs from ClinicalKey books • Use BM25 + BOW + RM3 • Scoring: • Use k=1, look at top answer only • Scoring metric EM (exact match) and F1 (f1-score) between label and predicted answers. 17 Paper says paragraphs and these settings work best We hope to use the top answer for display without further post- processing SQuAD metrics
  • 18. How well does BERTserini work on our data? • 100 questions from nursing text, classified as “Remembering” in Bloom’s taxonomy. • Run these questions against pipeline and manually inspect each answer. • ~ 60 “reasonable” answers. − Answer span correct, but… − Passage answers question 18 What causes a condition known as black hairy tongue? Hairy tongue is a condition in which the patient has an increased accumulation of keratin on the filiform papillae that results in a white, “hairy” appearance. This may be the result of either an increase in keratin production or a decrease in normal desquamation. Unless otherwise pigmented, the elongated filiform papillae are white ( Fig. 1.58). In the condition known as black hairy tongue, the papillae are a brown-to-black color because of chromogenic bacteria (Fig. 1.59). Tobacco and certain foods may also discolor the papillae. Although the cause is unknown, hydrogen peroxide, bismuth subsalicylates for upset stomach, alcohol, or chemical rinses have been suggested to stimulate the elongation of the filiform papillae that results in the appearance of hairy tongue. Oral Pathology for the Dental Hygienist: Introduction to Preliminary Diagnosis of Oral Lesions (PII: B9780323400626000013, ISBN: 978-0-323-40062-6)
  • 19. Some more good results What is a cause of tooth mobility? Periodontal probing is used to assess attachment levels to the tooth and is a prime indicator of health. Radiographic bone loss around a tooth does not indicate the presence of a disease state but is a reflection of past or present periodontal disease. Occlusal trauma may cause an increase in tooth mobility but does not cause marginal bone loss in the absence of periodontal disease. Contemporary Implant Dentistry: An Implant Is Not a Tooth: A Comparison of Periodontal Indices (PII: B9780323043731500484, ISBN: 978-0-323- 04373-1) 19 What is the cause of stridor? Stridor is a term used to describe a high-pitched sound caused by partial obstruction of the airway. Stridor can have an inspiratory, expiratory, or biphasic pattern (both inspiratory and expiratory). An inspiratory pattern suggests an upper airway cause (e.g., epiglottitis). An expiratory pattern suggests a lower airway etiology (e.g., tracheomalacia). A biphasic pattern suggests a glottic or subglottic obstruction (e.g., subglottic hemangioma). Imaging evaluation of the child with stridor is commonly performed with neck and/or chest radiographs, depending on the pattern of stridor and associated clinical findings. Emergency Radiology: The Requisites: Imaging Evaluation of Common Pediatric Emergencies (PII: B9780323376402000066, ISBN: 978-0-323-37640-2)
  • 20. As well as some fails What special considerations must be observed when a patient has epiglottitis? What special considerations related to her transplant need to be in place for this patient during critical care resuscitation? Advanced Critical Care Nursing: Bone Marrow Transplantation (PII: B9781416032199100397, ISBN: 978-1-4160-3219-9) 20 What conditions are treated by methotrexate? The combination of PUVA and methotrexate successfully treated five patients with erythrodermic psoriasis and two with pustular psoriasis. According to the authors, annual methotrexate doses could be reduced by 50% by adding PUVA to the regimen. Treatment of Skin Disease: Comprehensive Therapeutic Strategies: Psoriasis (PII: B978070206912300210X, ISBN: 978-0-7020-6912-3)Meaningless answer Surely a better answer exists?
  • 21. Reader Experiments • Results of evaluating various Reader configurations (no Retriever) against SQuAD dataset. • Encouraging results for reading comprehension task, i.e., when appropriate passage is provided. 21 Parameters EM F1 BERT-base uncased + SQuAD 1.1 75.86 82.41 BERT-base uncased + SQuAD 2.0 74.03 77.30 SciBERT + SQuAD 1.1 79.10 87.26 Human (SQuAD v2) 86.83 89.45
  • 22. MedSQuAD dataset • SQuAD contains (question, passage, answer) triples. − Task is Reading Comprehension, i.e., find the most appropriate span in the passage to return as an answer to the question. • Nursing content = (question, answer) pairs. • MedSQUAD dataset − Good answers from nursing questions + top passages, select best passage manually − Passages in ClinicalKey + automatic question generation, select triples manually − Approximately 300 (question, passage, answer) triples. 22
  • 23. Retriever Experiments • Adding default retriever backend − Parse the question into appropriate query (BM25 + BoW worked best) − Rerank (RM3 worked best) and return top 50 result passages − Reader generates answer using each of the top 50 passages − Returns the top (k=1) answer by segment and span score • Scores drop by 40+ points! 23 Reader not getting the “right” passages? Parameters EM F1 Baseline (no retriever) 65.11 76.03 Anserini retriever (BM25+Bow+RM3, 50 results, k=1) 23.02 30.50 Passage reranking? (Nogueira and Cho, 2020) Passage Reranking with BERT (Nogueira and Cho, 2020)
  • 24. BERT Based Reranker (Unsupervised) • BERT-as-a-service (BaaS) wraps a BERT-base-uncased model and returns embeddings from its last layer [-1]. • Query embeddings produced from query using BaaS. • Passage embeddings produced for top 50 passages returned by query using BaaS. • Cosine similarity computed between query vector and passage vectors and passages reranked by similarity descending. 24 Parameters EM F1 BM25+BoW+RM3, 50 records 8.27 11.45
  • 25. Query Sentence Relevance Reranker • Model predicts relevance (0/1) between query and single sentence. • Trained on TREC Microblog dataset (120,000 query sentence pairs) • Classifier fine-tuned from BERT-base-uncased for 2 epochs, Adam optimizer with learning rate 2e-5, and batch size 32, F1-score: 0.86. • Passage is scored as #-relevant sentences / #-number of sentences and reranked by score descending. 25 Parameters EM F1 BM25+BoW+RM3, 100 records 13.35 19.19
  • 26. Passage Relevance Reranker • Model predicts relevance (number between 0 and 1) between passage and question. • Trained using SQuAD 1.1 (passage, question) pairs with negative sampling. • Regression model fine-tuned from BERT-base- uncased for 2 epochs, batch size 8, Adam optimizer with learning rate 2e-5, RMSE 0.3. • Passage is ranked by relevance score descending. 26 Parameters EM F1 BM25+BoW+RM3, 50 records 8.99 15.69
  • 27. Siamese BERT Reranker • Pretrained models from https://github.com/UKPLab/sentence-transformers to produce embeddings from question and passage text. • Passage score is mean or maximum similarity between question vector and all sentences in passage. • Passages ranked by score descending. 27 Parameters EM F1 BM25+BoW+RM3, 50 records, max similarity, model: bert_base_nli_mean_tokens. 16.91 24.95
  • 28. Reranker Scores (all together) 28 Parameters EM F1 BERTserini (Paragraph, k=100) 38.6 46.1 BERTserini (Paragraph, k=29) 36.6 44.0 BERTserini (Article, k=5) 19.1 25.9 Anserini (BM25+BoW+RM3, SciBERT, MedSQuAD, Paragraph, k=1) (baseline) 23.02 30.5 RM3 replaced with BERT reranker 8.27 11.45 RM3 replaced with Query Sentence Relevance reranker 13.35 19.19 RM3 replaced with Passage Relevance reranker 8.99 15.69 RM3 replaced with Siamese BERT reranker 16.91 24.95 From paper Our results
  • 29. Conclusions • We couldn’t beat Anserini with any of our Passage Rerankers. • Our results are comparable (paragraph, k=1) to those reported in the paper. • However, response time is unacceptable because of slow Reader component (techniques such as model distillation might help somewhat). • Quality of answer snippet at k=1 not always acceptable, and too risky for production use. • We also want to use the selected passage and the source metadata as additional provenance for the answer 29
  • 30. Thank you I am @sujitpal on relevancy.slack.com if you have questions