SlideShare a Scribd company logo
1 of 34
Vladimir Ageev, Lead DS @EPAM
N O V E M B E R 2 0 2 3
EPAM Proprietary & Confidential.
EPAM Proprietary & Confidential. 2
We’ll cover
Tabular Question Answering case-study
• Business problem
• State of the Art
• Fine-tuning
• Productionalization
EPAM Proprietary & Confidential. 3
TQA Feature
EPAM Proprietary & Confidential. 4
Bu
Product
SaaS platform designed to integrate technical publications into Engineering
workflows:
• provides access to
• enriches experience with like
smart search, comparison or entity linking
EPAM Proprietary & Confidential. 5
Bu
Table Question answering
Why?
• Hundreds of thousands
popular PDFs have
• Keyword-based search might not
find them
• Semantic search or general QA models
do not account for table structure
EPAM Proprietary & Confidential. 6
Bu
TQA: formal task
INPUT
User query:
”TruthfulQA highest % true”
Table representation:
[{
“text”: “% true”,
“row_id”: 0,
“col_id” : 3
}, …]
Caption:
“Table 44: Evaluation results on …”
OUTPUT
Answers coordinates
{
“text”: “79.92”,
“operation”: None,
“cells”: [{
“col_id”: 3,
“row_id”: 15
}]
}
Assumption: table detection, parsing and retrieval from PDFs are solved
Task: given a highlight
row_id: 15
col_id: 3
EPAM Proprietary & Confidential. 7
TQA Models
EPAM Proprietary & Confidential. 8
Bu
State of TQA
What model to chose?
• SOTA : Dater – has Open AI GPT-3
under the hood, insecure (for us)
• TaBERT – CC-BY-NC 4.0 licensed
Scores on WikiTableQuestions
*https://paperswithcode.com/sota/semantic-parsing-on-wikitablequestions
EPAM Proprietary & Confidential. 9
Bu
State of TQA
What model to chose?
• SOTA : Dater – has Open AI GPT-3
under the hood, insecure (for us)
• TaBERT – CC-BY-NC 4.0 licensed
• OmniTab – Seq2Seq model
generative, no cell highlighting
• TAPEX –BART-based model
generative, no cell highlighting
Scores on WikiTableQuestions
*https://paperswithcode.com/sota/semantic-parsing-on-wikitablequestions
EPAM Proprietary & Confidential. 10
Bu
State of TQA
What model to chose?
• SOTA : Dater – has Open AI GPT-3
under the hood, insecure (for us)
• TaBERT – CC-BY-NC 4.0 licensed
• OmniTab – Seq2Seq model
generative, no cell highlighting
• TAPEX –BART-based model
generative, no cell highlighting
– BERT-based model
aggregations and highlighting
MIT License
Scores on WikiTableQuestions
*https://paperswithcode.com/sota/semantic-parsing-on-wikitablequestions
EPAM Proprietary & Confidential. 11
Bu
TAPAS: how it works
BERT-based transformer encoder
Two classification heads:
Additional positional embeddings:
• Column ID
• Row ID
• Segment: query / table
• Rank: non-comparable or order
EPAM Proprietary & Confidential. 12
Fine-tuning
EPAM Proprietary & Confidential. 13
Bu
Evaluation & fine-tuning
Question types:
• Extractive
Q: Max truthfulQA %
A: 79.92
Highlighted cell
EPAM Proprietary & Confidential. 14
Bu
Evaluation & fine-tuning
Question types:
• Extractive: Answer, Cells
• Generative
Q: Average %info for Llama 2
A: 46,17
Operation: AVG
Highlighted cells
EPAM Proprietary & Confidential. 15
Bu
Evaluation & fine-tuning
Question types:
• Extractive: Answer, Cells
• Generative: Answer, Cells, Aggregation
• Unanswerable
Q: Mistral % true on TruthfulQA
A: None
Operation: None
No cells
EPAM Proprietary & Confidential. 16
Bu
Evaluation & fine-tuning
Question types:
• Extractive: Answer, Cells
• Generative: Answer, Cells, Aggregation
• Unanswerable: None
Dataset size:
~ 3K tables
~ 10 QA pair per table
Annotation:
• ~3 months
• ~2-5 annotators, 2 rounds:
• separate tables –more diverse
• several tables per document – retrieval tests
EPAM Proprietary & Confidential. 17
Bu
Evaluation & fine-tuning
How to evaluate?
• F1 at cell-sets level
• F1 at answer tokens level
• Micro / macro averaging over
question types / tables / docs
EPAM Proprietary & Confidential. 18
Bu
Evaluation & fine-tuning
How to evaluate?
• F1 at cell-sets level
• F1 at answer tokens level
• Micro / macro averaging over
question types / tables / docs
Is 80% F1 enough? 50%? Run an impression test!
• Retrieval quality
• Correct response rate
• Overall impression (good to go?)
“ according to the following:
Very poor - The service doesn’t meet expectations.
…
Very good - The service provides great experience.”
“ Is relevant ?
Were cells highlighted?
Are cells ?”
EPAM Proprietary & Confidential. 19
Bu
Model/training parameters
Resources: 1x Nvidia A100 80GB
What worked for us
Training speed-up
• Gradient checkpointing
• Tensor Cores: torch.set_float32_matmul_precision('high’)
Optimization
• LR scheduling – cycling warmup + cosine decay
Data:
• Decrease “unanswerable” type in the batch
• Dropout
EPAM Proprietary & Confidential. 20
Bu
Performance
*Most of the tests conducted by Vadzim Piatrou
Model
Macro
tok-F1
Macro
cell-F1
Extractive
tok-F1
Extractive
cell-F1
Generative
tok-F1
Generative
cell-F1
Unanswerable
tok-F1
Unanswerable
cell-F1
TAPAS-Large Baseline 25.6 33.4 42.9 45.3 11.6 32.0 15.0 16.8
TAPAS-Large Finetuned 45.2 63.21 57.8 59.8 1.3 57.2 76.0 76.1
TAPAS-Base Baseline 23.6 29.0 26.9 38.3 10.7 26.9 17.6 18.0
Best model is Large
Finetuned models up to 2x better (incl. impression tests)
Generation capability degrades strongly
EPAM Proprietary & Confidential. 21
Bu
Can we use LLMs for it?
Good prompt is all you need, right?
Issues we faced:
• Hallucination – model comes up with facts outside of the table context
• Difficulties with understanding cell coordinates
• Providing structured output
EPAM Proprietary & Confidential. 22
Bu
Llama vs TAPAS
Model
Macro
tok-F1
Macro
cell-F1
Extractive
tok-F1
Extractive
cell-F1
Generative
tok-F1
Generative cell-
F1
Unanswerable
tok-F1
Unanswerable
cell-F1
TAPAS-Base Baseline 23.6 29.0 26.9 38.3 10.7 26.9 17.6 18.0
TAPAS-Base Finetuned 43.7 60.0 55.5 56.7 1.2 52.7 73.9 74.0
- - - -
*Most of the tests conducted by Vadzim Piatrou
is comparable to the baseline, generatives are better
EPAM Proprietary & Confidential. 23
Bu
Llama vs TAPAS
Model
Macro
tok-F1
Macro
cell-F1
Extractive
tok-F1
Extractive
cell-F1
Generative
tok-F1
Generative cell-
F1
Unanswerable
tok-F1
Unanswerable
cell-F1
TAPAS-Base Baseline 23.6 29.0 26.9 38.3 10.7 26.9 17.6 18.0
TAPAS-Base Finetuned 43.7 60.0 55.5 56.7 1.2 52.7 73.9 74.0
- - - -
Performance
TAPAS-Base on CPU is than Llama 2 4bit on GPU
6.5K test QA pairs take
• TAPAS-Base: ~ 45 mins, ~ 2 sec/pair, CPU
• Llama 2: 40 hours, ~ 10-20 sec/pair, Nvidia A100 80 Gb GPU
*Most of the tests conducted by Vadzim Piatrou
is comparable to the baseline, generatives are better
EPAM Proprietary & Confidential. 24
Bu
Question-table classifier
Good table retrieval is 80% of success
• If we are not sure about the answer, let’s still highlight the table
• Note that TAPAS confidence is ether 0 or 1
60% in F1 is the best we’ve got
EPAM Proprietary & Confidential. 25
Bu
Question-table classifier
Query
Candidate Table
TAPAS Answer
LGBM
Classifier
Simple
Hard
Features
It acheves about 80% in F1
Selected ”simple” answers have F1 > 80%
Good table retrieval is 80% of success
• If we are not sure about the answer, let’s still highlight the table
• Note that TAPAS confidence is ether 0 or 1
60% in F1 is the best we’ve got
Let’s build a to decide!
EPAM Proprietary & Confidential. 26
Productionalization
EPAM Proprietary & Confidential. 27
Bu
Service schema
Approx. architecture
Let’s cover it step-by step
client
Query
Custom
Document
Decomposition
Documents
Storage
Search index Search service
Tables
TAPAS
QT
classifier
Answers
EPAM Proprietary & Confidential. 28
Bu
Service schema
Approx. architecture
client
Query
Custom
Document
Decomposition
Documents
Storage
Search index Search service
Tables
TAPAS
QT
classifier
Answers
Custom engine responsible for:
(Tesseract-based + custom models)
• Document decomposition like
- layout recognition (paragraph, title, section, etc.)
- table/figure detection
EPAM Proprietary & Confidential. 29
Bu
Service schema
Approx. architecture
client
Query
Custom
Document
Decomposition
Documents
Storage
Search index Search service
Tables
TAPAS
QT
classifier
Answers
+ custom GO-based service
• Manages search indices
• Provides API for other services for
• collection management
(like our TQA)
+ their features
EPAM Proprietary & Confidential. 30
Bu
Service schema
Approx. architecture
client
Query
Custom
Document
Decomposition
Documents
Storage
Search index Search service
Tables
TAPAS
QT
classifier
Answers
deployed as a rest-service
EPAM Proprietary & Confidential. 31
Summary
TQA is not “solved” yet!
- models are at 60% accuracy on open
datasets
- zero-shot on open source LLMs is not
enough
EPAM Proprietary & Confidential. 32
Summary
TQA is not “solved” yet!
- models are at 60% accuracy on open
datasets
- zero-shot on open source LLMs is not
enough
Annotation for TQA is quite long
- you need a dedicated team
(SMEs in a perfect world)
- for a small team it might take months!
- it is worth the wait:
increase in metrics could be
up to 2x
EPAM Proprietary & Confidential. 33
Summary
TQA is not “solved” yet!
- models are at 60% accuracy on open
datasets
- zero-shot on open source LLMs is not
enough
Annotation for TQA is quite long
- you need a dedicated team
(SMEs in a perfect world)
- for a small team it might take months!
- it is worth the wait:
increase in metrics could be
up to 2x
Use both offline and online metrics:
- token / cell level F1
- measure impression
- small accuracy still might be enough for business
EPAM Proprietary & Confidential. 34
Vladimir Ageev Vadzim Piatrou, Ph.D.

More Related Content

Similar to [DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System for In-Document Searches

제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL PerformanceTommy Lee
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Mt s11 test_design
Mt s11 test_designMt s11 test_design
Mt s11 test_designTestingGeeks
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceESUG
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16Sumi Ryu
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...Lionel Briand
 
Small is Beautiful- Fully Automate your Test Case Design
Small is Beautiful- Fully Automate your Test Case DesignSmall is Beautiful- Fully Automate your Test Case Design
Small is Beautiful- Fully Automate your Test Case DesignGeorgina Tilby
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceDVClub
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Breakthrough in Quality Management
Breakthrough in Quality ManagementBreakthrough in Quality Management
Breakthrough in Quality ManagementOptimalPlus
 
Improving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific LanguageImproving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific LanguageDr. Spock
 
Validation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environmentValidation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environmentObsidian Software
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentDVClub
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingSolarWinds
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 

Similar to [DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System for In-Document Searches (20)

제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Mt s11 test_design
Mt s11 test_designMt s11 test_design
Mt s11 test_design
 
Sista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performanceSista: Improving Cog’s JIT performance
Sista: Improving Cog’s JIT performance
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
My sql cluster case study apr16
My sql cluster case study apr16My sql cluster case study apr16
My sql cluster case study apr16
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
 
Small is Beautiful- Fully Automate your Test Case Design
Small is Beautiful- Fully Automate your Test Case DesignSmall is Beautiful- Fully Automate your Test Case Design
Small is Beautiful- Fully Automate your Test Case Design
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification Experience
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Breakthrough in Quality Management
Breakthrough in Quality ManagementBreakthrough in Quality Management
Breakthrough in Quality Management
 
Improving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific LanguageImproving Batch-Process Testing Techniques with a Domain-Specific Language
Improving Batch-Process Testing Techniques with a Domain-Specific Language
 
Validation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environmentValidation and-design-in-a-small-team-environment
Validation and-design-in-a-small-team-environment
 
Validation and Design in a Small Team Environment
Validation and Design in a Small Team EnvironmentValidation and Design in a Small Team Environment
Validation and Design in a Small Team Environment
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 

More from DataScienceConferenc1

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdfDataScienceConferenc1
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...DataScienceConferenc1
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdfDataScienceConferenc1
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdfDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdfDataScienceConferenc1
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptxDataScienceConferenc1
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdfDataScienceConferenc1
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdfDataScienceConferenc1
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...DataScienceConferenc1
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdfDataScienceConferenc1
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptxDataScienceConferenc1
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...DataScienceConferenc1
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptxDataScienceConferenc1
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...DataScienceConferenc1
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...DataScienceConferenc1
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdfDataScienceConferenc1
 

More from DataScienceConferenc1 (20)

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
 

Recently uploaded

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Recently uploaded (20)

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System for In-Document Searches

  • 1. Vladimir Ageev, Lead DS @EPAM N O V E M B E R 2 0 2 3 EPAM Proprietary & Confidential.
  • 2. EPAM Proprietary & Confidential. 2 We’ll cover Tabular Question Answering case-study • Business problem • State of the Art • Fine-tuning • Productionalization
  • 3. EPAM Proprietary & Confidential. 3 TQA Feature
  • 4. EPAM Proprietary & Confidential. 4 Bu Product SaaS platform designed to integrate technical publications into Engineering workflows: • provides access to • enriches experience with like smart search, comparison or entity linking
  • 5. EPAM Proprietary & Confidential. 5 Bu Table Question answering Why? • Hundreds of thousands popular PDFs have • Keyword-based search might not find them • Semantic search or general QA models do not account for table structure
  • 6. EPAM Proprietary & Confidential. 6 Bu TQA: formal task INPUT User query: ”TruthfulQA highest % true” Table representation: [{ “text”: “% true”, “row_id”: 0, “col_id” : 3 }, …] Caption: “Table 44: Evaluation results on …” OUTPUT Answers coordinates { “text”: “79.92”, “operation”: None, “cells”: [{ “col_id”: 3, “row_id”: 15 }] } Assumption: table detection, parsing and retrieval from PDFs are solved Task: given a highlight row_id: 15 col_id: 3
  • 7. EPAM Proprietary & Confidential. 7 TQA Models
  • 8. EPAM Proprietary & Confidential. 8 Bu State of TQA What model to chose? • SOTA : Dater – has Open AI GPT-3 under the hood, insecure (for us) • TaBERT – CC-BY-NC 4.0 licensed Scores on WikiTableQuestions *https://paperswithcode.com/sota/semantic-parsing-on-wikitablequestions
  • 9. EPAM Proprietary & Confidential. 9 Bu State of TQA What model to chose? • SOTA : Dater – has Open AI GPT-3 under the hood, insecure (for us) • TaBERT – CC-BY-NC 4.0 licensed • OmniTab – Seq2Seq model generative, no cell highlighting • TAPEX –BART-based model generative, no cell highlighting Scores on WikiTableQuestions *https://paperswithcode.com/sota/semantic-parsing-on-wikitablequestions
  • 10. EPAM Proprietary & Confidential. 10 Bu State of TQA What model to chose? • SOTA : Dater – has Open AI GPT-3 under the hood, insecure (for us) • TaBERT – CC-BY-NC 4.0 licensed • OmniTab – Seq2Seq model generative, no cell highlighting • TAPEX –BART-based model generative, no cell highlighting – BERT-based model aggregations and highlighting MIT License Scores on WikiTableQuestions *https://paperswithcode.com/sota/semantic-parsing-on-wikitablequestions
  • 11. EPAM Proprietary & Confidential. 11 Bu TAPAS: how it works BERT-based transformer encoder Two classification heads: Additional positional embeddings: • Column ID • Row ID • Segment: query / table • Rank: non-comparable or order
  • 12. EPAM Proprietary & Confidential. 12 Fine-tuning
  • 13. EPAM Proprietary & Confidential. 13 Bu Evaluation & fine-tuning Question types: • Extractive Q: Max truthfulQA % A: 79.92 Highlighted cell
  • 14. EPAM Proprietary & Confidential. 14 Bu Evaluation & fine-tuning Question types: • Extractive: Answer, Cells • Generative Q: Average %info for Llama 2 A: 46,17 Operation: AVG Highlighted cells
  • 15. EPAM Proprietary & Confidential. 15 Bu Evaluation & fine-tuning Question types: • Extractive: Answer, Cells • Generative: Answer, Cells, Aggregation • Unanswerable Q: Mistral % true on TruthfulQA A: None Operation: None No cells
  • 16. EPAM Proprietary & Confidential. 16 Bu Evaluation & fine-tuning Question types: • Extractive: Answer, Cells • Generative: Answer, Cells, Aggregation • Unanswerable: None Dataset size: ~ 3K tables ~ 10 QA pair per table Annotation: • ~3 months • ~2-5 annotators, 2 rounds: • separate tables –more diverse • several tables per document – retrieval tests
  • 17. EPAM Proprietary & Confidential. 17 Bu Evaluation & fine-tuning How to evaluate? • F1 at cell-sets level • F1 at answer tokens level • Micro / macro averaging over question types / tables / docs
  • 18. EPAM Proprietary & Confidential. 18 Bu Evaluation & fine-tuning How to evaluate? • F1 at cell-sets level • F1 at answer tokens level • Micro / macro averaging over question types / tables / docs Is 80% F1 enough? 50%? Run an impression test! • Retrieval quality • Correct response rate • Overall impression (good to go?) “ according to the following: Very poor - The service doesn’t meet expectations. … Very good - The service provides great experience.” “ Is relevant ? Were cells highlighted? Are cells ?”
  • 19. EPAM Proprietary & Confidential. 19 Bu Model/training parameters Resources: 1x Nvidia A100 80GB What worked for us Training speed-up • Gradient checkpointing • Tensor Cores: torch.set_float32_matmul_precision('high’) Optimization • LR scheduling – cycling warmup + cosine decay Data: • Decrease “unanswerable” type in the batch • Dropout
  • 20. EPAM Proprietary & Confidential. 20 Bu Performance *Most of the tests conducted by Vadzim Piatrou Model Macro tok-F1 Macro cell-F1 Extractive tok-F1 Extractive cell-F1 Generative tok-F1 Generative cell-F1 Unanswerable tok-F1 Unanswerable cell-F1 TAPAS-Large Baseline 25.6 33.4 42.9 45.3 11.6 32.0 15.0 16.8 TAPAS-Large Finetuned 45.2 63.21 57.8 59.8 1.3 57.2 76.0 76.1 TAPAS-Base Baseline 23.6 29.0 26.9 38.3 10.7 26.9 17.6 18.0 Best model is Large Finetuned models up to 2x better (incl. impression tests) Generation capability degrades strongly
  • 21. EPAM Proprietary & Confidential. 21 Bu Can we use LLMs for it? Good prompt is all you need, right? Issues we faced: • Hallucination – model comes up with facts outside of the table context • Difficulties with understanding cell coordinates • Providing structured output
  • 22. EPAM Proprietary & Confidential. 22 Bu Llama vs TAPAS Model Macro tok-F1 Macro cell-F1 Extractive tok-F1 Extractive cell-F1 Generative tok-F1 Generative cell- F1 Unanswerable tok-F1 Unanswerable cell-F1 TAPAS-Base Baseline 23.6 29.0 26.9 38.3 10.7 26.9 17.6 18.0 TAPAS-Base Finetuned 43.7 60.0 55.5 56.7 1.2 52.7 73.9 74.0 - - - - *Most of the tests conducted by Vadzim Piatrou is comparable to the baseline, generatives are better
  • 23. EPAM Proprietary & Confidential. 23 Bu Llama vs TAPAS Model Macro tok-F1 Macro cell-F1 Extractive tok-F1 Extractive cell-F1 Generative tok-F1 Generative cell- F1 Unanswerable tok-F1 Unanswerable cell-F1 TAPAS-Base Baseline 23.6 29.0 26.9 38.3 10.7 26.9 17.6 18.0 TAPAS-Base Finetuned 43.7 60.0 55.5 56.7 1.2 52.7 73.9 74.0 - - - - Performance TAPAS-Base on CPU is than Llama 2 4bit on GPU 6.5K test QA pairs take • TAPAS-Base: ~ 45 mins, ~ 2 sec/pair, CPU • Llama 2: 40 hours, ~ 10-20 sec/pair, Nvidia A100 80 Gb GPU *Most of the tests conducted by Vadzim Piatrou is comparable to the baseline, generatives are better
  • 24. EPAM Proprietary & Confidential. 24 Bu Question-table classifier Good table retrieval is 80% of success • If we are not sure about the answer, let’s still highlight the table • Note that TAPAS confidence is ether 0 or 1 60% in F1 is the best we’ve got
  • 25. EPAM Proprietary & Confidential. 25 Bu Question-table classifier Query Candidate Table TAPAS Answer LGBM Classifier Simple Hard Features It acheves about 80% in F1 Selected ”simple” answers have F1 > 80% Good table retrieval is 80% of success • If we are not sure about the answer, let’s still highlight the table • Note that TAPAS confidence is ether 0 or 1 60% in F1 is the best we’ve got Let’s build a to decide!
  • 26. EPAM Proprietary & Confidential. 26 Productionalization
  • 27. EPAM Proprietary & Confidential. 27 Bu Service schema Approx. architecture Let’s cover it step-by step client Query Custom Document Decomposition Documents Storage Search index Search service Tables TAPAS QT classifier Answers
  • 28. EPAM Proprietary & Confidential. 28 Bu Service schema Approx. architecture client Query Custom Document Decomposition Documents Storage Search index Search service Tables TAPAS QT classifier Answers Custom engine responsible for: (Tesseract-based + custom models) • Document decomposition like - layout recognition (paragraph, title, section, etc.) - table/figure detection
  • 29. EPAM Proprietary & Confidential. 29 Bu Service schema Approx. architecture client Query Custom Document Decomposition Documents Storage Search index Search service Tables TAPAS QT classifier Answers + custom GO-based service • Manages search indices • Provides API for other services for • collection management (like our TQA) + their features
  • 30. EPAM Proprietary & Confidential. 30 Bu Service schema Approx. architecture client Query Custom Document Decomposition Documents Storage Search index Search service Tables TAPAS QT classifier Answers deployed as a rest-service
  • 31. EPAM Proprietary & Confidential. 31 Summary TQA is not “solved” yet! - models are at 60% accuracy on open datasets - zero-shot on open source LLMs is not enough
  • 32. EPAM Proprietary & Confidential. 32 Summary TQA is not “solved” yet! - models are at 60% accuracy on open datasets - zero-shot on open source LLMs is not enough Annotation for TQA is quite long - you need a dedicated team (SMEs in a perfect world) - for a small team it might take months! - it is worth the wait: increase in metrics could be up to 2x
  • 33. EPAM Proprietary & Confidential. 33 Summary TQA is not “solved” yet! - models are at 60% accuracy on open datasets - zero-shot on open source LLMs is not enough Annotation for TQA is quite long - you need a dedicated team (SMEs in a perfect world) - for a small team it might take months! - it is worth the wait: increase in metrics could be up to 2x Use both offline and online metrics: - token / cell level F1 - measure impression - small accuracy still might be enough for business
  • 34. EPAM Proprietary & Confidential. 34 Vladimir Ageev Vadzim Piatrou, Ph.D.