SlideShare a Scribd company logo
1 of 14
Introducing the Open-Source
Library for Testing NLP Models
David Talby
CTO, John Snow Labs
Responsible AI is Not Optional
There are lots of Responsible AI Frameworks
But There’s a Big Gap in Implementation
Beyond Accuracy: Behavioral Testing of NLP
models with CheckList
Ribiero et. al., 2020
Sentiment analysis services of the top three cloud providers fail:
• 9-16% of the time when replacing neutral words
• 7-20% of the time when changing neutral named entities
• 36-42% of the time on some temporal tests
• Almost 100% of the time on some negation tests.
BBQ: A Hand-Built Bias Benchmark for
Question Answering
Parrish et. al., 2022
Biases around race, gender, physical appearance,
disability, and religion are ingrained in state-of-the-art
question answering models – sometimes changing the
likely answer more than 80% of the time.
Information Leakage in Embedding
Models
Song and Raghunathan, 2020
Data leakage of 50-70% of personal information
into popular word & sentence embeddings.
What Do You See in this Patient?
Behavioral Testing of Clinical NLP Models
van Aken et. al., 2022
Adding any mention of ethnicity to a patient note reduces their
predicted risk of mortality – with the most accurate model
producing the largest error.
Responsible AI Best Practices
1. Test Your Models!
Why would you expect untested software to work?
2. Don’t Reuse Academic Models in Production
Publishing research ≠ Building reliable systems
3. Test Beyond Accuracy
Robustness, Bias, Fairness, Toxicity, Efficiency, Safety, …
6
Simple
O’Reilly Media
Comprehensive
Test all aspects of
model quality before
going to production
Open Source
Open under the Apache
2.0 license and designed
for easy extension
Papers with Code
Generate & run
50+ test types on
popular NLP tasks
Introducing the NLP Test Library
NLP Test Automates 3 Steps in Your AI Workflow
NLP Test In 3 Lines of Code
from nlptest import Harness
h = Harness(model='dslim/bert-base-NER', hub='huggingface')
h.generate().run().report()
Generate a set of test cases
given a task, model & dataset
Run the test suite, generating
a data frame of test results
Generate a summary report
stating which tests have passed
Write Once, Test Everywhere
from nlptest import Harness
h = Harness(model='ner_dl_bert', hub='johnsnowlabs')
h = Harness(model='dslim/bert-base-NER', hub='huggingface')
h = Harness(model='en_core_web_sm', hub='spacy')
Adding a new library or API?
All test types will generate & run.
Adding a new test type?
It will run on all supported libraries.
1. Auto-Generate Tests
2. Run Tests
Test type Test case Expected result
add_typos Wang Li is a ductor. Wang Li: Person
add_context Wang Li is a doctor. #careers Wang Li: Person
replace_to_hispanic_name Juan Moreno is a doctor. Juan Moreno: Person
min_gender_representation Female 30
min_gender_f1_score Female 0.85
From a test suite created with generate(), manually, or with load():
Category Pass Rate Minimum Pass Rate Pass?
Robustness 50% 75% 
Bias 85% 85% 
Representation 100% 100% 
Fairness 66% 100% 
Calling run() and then report() produces a summary:
3. Improve Models With Data Augmentation
h.augment(input_path='training_dataset', output_path='augmented_dataset')
new_model = nlp.load('model').fit('augmented_dataset')
Harness.load(save_dir='testcases', model=new_model, hub='johnsnowlabs').run()
Generate new augmented
labeled data for the model’s
training (not test!) dataset.
Train a new model using your
favorite framework using the
augmented training dataset.
Run a regression test: Create a
new test harness with the new
model and the old test suite.
Integrate Testing Into CI/CD or MLOps
class DataScienceWorkFlow(FlowSpec):
@step
def train(self):
...
@step
def run_tests(self):
harness = Harness.load(model=self.model, save_dir=“testsuite")
self.report = harness.run().report()
@step
def deploy(self):
if self.report["score"] > self.test_threshold:
...
Train a new version of a model
Run a regression test
Only deploy if the test passed
Getting Started with NLP Test
TUTORIALS AND EXAMPLES:
CONTRIBUTING:
https://github.com/johnsnowlabs/nlptest
COMMUNITY CHAT:
https://spark-nlp.slack.com @ #nlp-test
https://nlptest.org
Expect Rapid Releases & Long-Term Support from John Snow Labs.

More Related Content

Similar to Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP Summit, April 2023.pptx

[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language ModelsDataScienceConferenc1
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)Lviv Startup Club
 
HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesMichel Dumontier
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?BalaBit
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
Annotated Bibliography  .Guidelines Annotated Bibliograph.docxAnnotated Bibliography  .Guidelines Annotated Bibliograph.docx
Annotated Bibliography .Guidelines Annotated Bibliograph.docxjustine1simpson78276
 
Automated Test Case Generation and Execution from Models
Automated Test Case Generation and Execution from ModelsAutomated Test Case Generation and Execution from Models
Automated Test Case Generation and Execution from ModelsDharmalingam Ganesan
 

Similar to Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP Summit, April 2023.pptx (20)

[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)Roman Kyslyi: Використання та побудова LLM агентів (UA)
Roman Kyslyi: Використання та побудова LLM агентів (UA)
 
HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologies
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Test design techniques
Test design techniquesTest design techniques
Test design techniques
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
SEppt
SEpptSEppt
SEppt
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
Annotated Bibliography  .Guidelines Annotated Bibliograph.docxAnnotated Bibliography  .Guidelines Annotated Bibliograph.docx
Annotated Bibliography .Guidelines Annotated Bibliograph.docx
 
Automated Test Case Generation and Execution from Models
Automated Test Case Generation and Execution from ModelsAutomated Test Case Generation and Execution from Models
Automated Test Case Generation and Execution from Models
 

More from David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsDavid Talby
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022David Talby
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021David Talby
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in HealthcareDavid Talby
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understandingDavid Talby
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...David Talby
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection SystemDavid Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

More from David Talby (13)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP Summit, April 2023.pptx

  • 1. Introducing the Open-Source Library for Testing NLP Models David Talby CTO, John Snow Labs
  • 2. Responsible AI is Not Optional
  • 3. There are lots of Responsible AI Frameworks
  • 4. But There’s a Big Gap in Implementation Beyond Accuracy: Behavioral Testing of NLP models with CheckList Ribiero et. al., 2020 Sentiment analysis services of the top three cloud providers fail: • 9-16% of the time when replacing neutral words • 7-20% of the time when changing neutral named entities • 36-42% of the time on some temporal tests • Almost 100% of the time on some negation tests. BBQ: A Hand-Built Bias Benchmark for Question Answering Parrish et. al., 2022 Biases around race, gender, physical appearance, disability, and religion are ingrained in state-of-the-art question answering models – sometimes changing the likely answer more than 80% of the time. Information Leakage in Embedding Models Song and Raghunathan, 2020 Data leakage of 50-70% of personal information into popular word & sentence embeddings. What Do You See in this Patient? Behavioral Testing of Clinical NLP Models van Aken et. al., 2022 Adding any mention of ethnicity to a patient note reduces their predicted risk of mortality – with the most accurate model producing the largest error.
  • 5. Responsible AI Best Practices 1. Test Your Models! Why would you expect untested software to work? 2. Don’t Reuse Academic Models in Production Publishing research ≠ Building reliable systems 3. Test Beyond Accuracy Robustness, Bias, Fairness, Toxicity, Efficiency, Safety, …
  • 6. 6 Simple O’Reilly Media Comprehensive Test all aspects of model quality before going to production Open Source Open under the Apache 2.0 license and designed for easy extension Papers with Code Generate & run 50+ test types on popular NLP tasks Introducing the NLP Test Library
  • 7. NLP Test Automates 3 Steps in Your AI Workflow
  • 8. NLP Test In 3 Lines of Code from nlptest import Harness h = Harness(model='dslim/bert-base-NER', hub='huggingface') h.generate().run().report() Generate a set of test cases given a task, model & dataset Run the test suite, generating a data frame of test results Generate a summary report stating which tests have passed
  • 9. Write Once, Test Everywhere from nlptest import Harness h = Harness(model='ner_dl_bert', hub='johnsnowlabs') h = Harness(model='dslim/bert-base-NER', hub='huggingface') h = Harness(model='en_core_web_sm', hub='spacy') Adding a new library or API? All test types will generate & run. Adding a new test type? It will run on all supported libraries.
  • 11. 2. Run Tests Test type Test case Expected result add_typos Wang Li is a ductor. Wang Li: Person add_context Wang Li is a doctor. #careers Wang Li: Person replace_to_hispanic_name Juan Moreno is a doctor. Juan Moreno: Person min_gender_representation Female 30 min_gender_f1_score Female 0.85 From a test suite created with generate(), manually, or with load(): Category Pass Rate Minimum Pass Rate Pass? Robustness 50% 75%  Bias 85% 85%  Representation 100% 100%  Fairness 66% 100%  Calling run() and then report() produces a summary:
  • 12. 3. Improve Models With Data Augmentation h.augment(input_path='training_dataset', output_path='augmented_dataset') new_model = nlp.load('model').fit('augmented_dataset') Harness.load(save_dir='testcases', model=new_model, hub='johnsnowlabs').run() Generate new augmented labeled data for the model’s training (not test!) dataset. Train a new model using your favorite framework using the augmented training dataset. Run a regression test: Create a new test harness with the new model and the old test suite.
  • 13. Integrate Testing Into CI/CD or MLOps class DataScienceWorkFlow(FlowSpec): @step def train(self): ... @step def run_tests(self): harness = Harness.load(model=self.model, save_dir=“testsuite") self.report = harness.run().report() @step def deploy(self): if self.report["score"] > self.test_threshold: ... Train a new version of a model Run a regression test Only deploy if the test passed
  • 14. Getting Started with NLP Test TUTORIALS AND EXAMPLES: CONTRIBUTING: https://github.com/johnsnowlabs/nlptest COMMUNITY CHAT: https://spark-nlp.slack.com @ #nlp-test https://nlptest.org Expect Rapid Releases & Long-Term Support from John Snow Labs.