SlideShare a Scribd company logo
1 of 6
Download to read offline
Question Answer Pair Auto Generation:
Reliability and Consistency Assessment for Large Language Models Applications
- Jyotirmoy Sundi
Evaluation/Testing of LLM Apps is a Pain
● Manual Testing
○ By a small set of users to test or outsource testing
○ Based on bugs in prod, developer updates prompts/extraction/llm chaining/ reasoning etc
○ Time-consuming and error-prone, leading to inaccurate results.
● Coverage
○ 100% coverage is hard on a large corpus, a big army of manual testers is needed
● High Variability in Output
○ based on user inputs, prompts, providers like openai, cohere, claude, google palm etc
● Reliability & Hallucinations
○ handling of different chat contexts and intents
○ handling turnkey questions when user suddenly asks about a new concept , previous chat message
topics/intents become useless
● Privacy
○ redact/anonymization of data/masking PII data before sending to a LLM provider endpoint
○ adhere to updated GDPR/CCPA/EU govt policies.
Advantages of Datacraft
● Automated question/answer data generation
○ Reduced manual testing
● Improved coverage
○ Not 100% but can be much higher based on your budget
● Reduced bias
○ Consistent set of ground truths to rank llm endpoints
● Enhanced testing efficiency
○ With a ground truth dataset evaluate at scale and quickly
● Consistency
○ Test consistency across any changes in chaining, prompts, rag , provider updates
● Ranking RAG responses systematically
○ Test multiple LLM apps based on prompts/providers/rag techniques to choose a winner before rolling out to customers
Overview of Datacraft
● Stratified sampling
○ Method for selecting samples from a diverse population by dividing it into subgroups or strata based on specific
characteristics
○ Ensures that our QA dataset accurately represents various types of questions with high coverage across the
corpus
● Verified QA prompts for various scenarios like blogs, readme, text files, and catalogs.
○ Curated Prompts are verified and tested to ensure they are effective in generating QA datasets
○ Addition of more prompts is easy
● Generation with context injection of sampled data & selected prompts from each strata
○ Using Language Models, we create questions and answers based on the sampled data.
○ Context Injection: We inject relevant context from the sampled documents into the generated QA pairs
Use Cases
● Question answering on any data source
○ PDFs / CSV / JSON / Text files
○ README.mk
○ Online Blogs
■ Imagine a user experience of reading blogs through the seamless integration of personalized curated
Q&A section, thoughtfully designed to facilitate easy navigation and comprehension of the content in the
blog, might lead to increased inquiry, engagement, conversions, signups etc.
○ API/SDK Docs
○ Databases
○ Commerce Catalogs
● Synthetic Dataset generation for any custom model development
○ AI/ML training of tabular or text data
○ Generate NER synthetic data for named entity recognition(NER) models to train on custom NERs or common
like name, credit card, and SSN or any private entities of a company
■ Help in data redaction/anonymization
Demo - https://github.com/sundi133/llm-datacraft

More Related Content

Similar to LLM-Datacraft.pdf

AI Builder Deepdive DynamicsPower! Brussels 2019
AI Builder Deepdive DynamicsPower! Brussels 2019AI Builder Deepdive DynamicsPower! Brussels 2019
AI Builder Deepdive DynamicsPower! Brussels 2019Rebekka Aalbers-de Jong
 
Being a Data Science Product Manager
Being a Data Science Product ManagerBeing a Data Science Product Manager
Being a Data Science Product ManagerRam Narayan Subudhi
 
Agile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful InformationAgile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful InformationTechWell
 
AbbyBrownAB_Resume
AbbyBrownAB_ResumeAbbyBrownAB_Resume
AbbyBrownAB_ResumeAbby Brown
 
AbbyBrownAB_Resume
AbbyBrownAB_ResumeAbbyBrownAB_Resume
AbbyBrownAB_ResumeAbby Brown
 
Open source ml systems that need to be built
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be builtNikhil Garg
 
[Webinar] Getting started with server-side testing - presented by WiderFunnel...
[Webinar] Getting started with server-side testing - presented by WiderFunnel...[Webinar] Getting started with server-side testing - presented by WiderFunnel...
[Webinar] Getting started with server-side testing - presented by WiderFunnel...Chris Goward
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Getting Started with Server-Side Testing
Getting Started with Server-Side TestingGetting Started with Server-Side Testing
Getting Started with Server-Side TestingOptimizely
 
20160422 Speedy Framework Enterprise Application Development Platform
20160422 Speedy Framework Enterprise Application Development Platform20160422 Speedy Framework Enterprise Application Development Platform
20160422 Speedy Framework Enterprise Application Development PlatformHarezmi IT Solutions
 
Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...
Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...
Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...Learnosity
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Maxim Salnikov
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning India Quotient
 
Hooduku sugar crm
Hooduku sugar crmHooduku sugar crm
Hooduku sugar crmhooduku
 
QA is not quality
QA is not qualityQA is not quality
QA is not qualityAlex Wilson
 
Software Testing_Resume
Software Testing_ResumeSoftware Testing_Resume
Software Testing_ResumeHimani Goyal
 

Similar to LLM-Datacraft.pdf (20)

AI Builder Deepdive DynamicsPower! Brussels 2019
AI Builder Deepdive DynamicsPower! Brussels 2019AI Builder Deepdive DynamicsPower! Brussels 2019
AI Builder Deepdive DynamicsPower! Brussels 2019
 
Being a Data Science Product Manager
Being a Data Science Product ManagerBeing a Data Science Product Manager
Being a Data Science Product Manager
 
Agile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful InformationAgile Testing Process Analytics: From Data to Insightful Information
Agile Testing Process Analytics: From Data to Insightful Information
 
AbbyBrownAB_Resume
AbbyBrownAB_ResumeAbbyBrownAB_Resume
AbbyBrownAB_Resume
 
AbbyBrownAB_Resume
AbbyBrownAB_ResumeAbbyBrownAB_Resume
AbbyBrownAB_Resume
 
Open source ml systems that need to be built
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be built
 
[Webinar] Getting started with server-side testing - presented by WiderFunnel...
[Webinar] Getting started with server-side testing - presented by WiderFunnel...[Webinar] Getting started with server-side testing - presented by WiderFunnel...
[Webinar] Getting started with server-side testing - presented by WiderFunnel...
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Moulika_QA_+4Yrs_TestingExp
Moulika_QA_+4Yrs_TestingExpMoulika_QA_+4Yrs_TestingExp
Moulika_QA_+4Yrs_TestingExp
 
Getting Started with Server-Side Testing
Getting Started with Server-Side TestingGetting Started with Server-Side Testing
Getting Started with Server-Side Testing
 
20160422 Speedy Framework Enterprise Application Development Platform
20160422 Speedy Framework Enterprise Application Development Platform20160422 Speedy Framework Enterprise Application Development Platform
20160422 Speedy Framework Enterprise Application Development Platform
 
Aakanksha_Agnani_j2016
Aakanksha_Agnani_j2016Aakanksha_Agnani_j2016
Aakanksha_Agnani_j2016
 
Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...
Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...
Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...
 
Software testing
Software testingSoftware testing
Software testing
 
Slack
SlackSlack
Slack
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
Hooduku sugar crm
Hooduku sugar crmHooduku sugar crm
Hooduku sugar crm
 
QA is not quality
QA is not qualityQA is not quality
QA is not quality
 
Software Testing_Resume
Software Testing_ResumeSoftware Testing_Resume
Software Testing_Resume
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

LLM-Datacraft.pdf

  • 1. Question Answer Pair Auto Generation: Reliability and Consistency Assessment for Large Language Models Applications - Jyotirmoy Sundi
  • 2. Evaluation/Testing of LLM Apps is a Pain ● Manual Testing ○ By a small set of users to test or outsource testing ○ Based on bugs in prod, developer updates prompts/extraction/llm chaining/ reasoning etc ○ Time-consuming and error-prone, leading to inaccurate results. ● Coverage ○ 100% coverage is hard on a large corpus, a big army of manual testers is needed ● High Variability in Output ○ based on user inputs, prompts, providers like openai, cohere, claude, google palm etc ● Reliability & Hallucinations ○ handling of different chat contexts and intents ○ handling turnkey questions when user suddenly asks about a new concept , previous chat message topics/intents become useless ● Privacy ○ redact/anonymization of data/masking PII data before sending to a LLM provider endpoint ○ adhere to updated GDPR/CCPA/EU govt policies.
  • 3. Advantages of Datacraft ● Automated question/answer data generation ○ Reduced manual testing ● Improved coverage ○ Not 100% but can be much higher based on your budget ● Reduced bias ○ Consistent set of ground truths to rank llm endpoints ● Enhanced testing efficiency ○ With a ground truth dataset evaluate at scale and quickly ● Consistency ○ Test consistency across any changes in chaining, prompts, rag , provider updates ● Ranking RAG responses systematically ○ Test multiple LLM apps based on prompts/providers/rag techniques to choose a winner before rolling out to customers
  • 4. Overview of Datacraft ● Stratified sampling ○ Method for selecting samples from a diverse population by dividing it into subgroups or strata based on specific characteristics ○ Ensures that our QA dataset accurately represents various types of questions with high coverage across the corpus ● Verified QA prompts for various scenarios like blogs, readme, text files, and catalogs. ○ Curated Prompts are verified and tested to ensure they are effective in generating QA datasets ○ Addition of more prompts is easy ● Generation with context injection of sampled data & selected prompts from each strata ○ Using Language Models, we create questions and answers based on the sampled data. ○ Context Injection: We inject relevant context from the sampled documents into the generated QA pairs
  • 5. Use Cases ● Question answering on any data source ○ PDFs / CSV / JSON / Text files ○ README.mk ○ Online Blogs ■ Imagine a user experience of reading blogs through the seamless integration of personalized curated Q&A section, thoughtfully designed to facilitate easy navigation and comprehension of the content in the blog, might lead to increased inquiry, engagement, conversions, signups etc. ○ API/SDK Docs ○ Databases ○ Commerce Catalogs ● Synthetic Dataset generation for any custom model development ○ AI/ML training of tabular or text data ○ Generate NER synthetic data for named entity recognition(NER) models to train on custom NERs or common like name, credit card, and SSN or any private entities of a company ■ Help in data redaction/anonymization