LLM-Datacraft.pdf

•

0 likes•7 views

Automated generation of high quality dataset based on your documents (pdf, csv, json, text files etc) to evaluate any LLM app endpoints

Data & Analytics

Question Answer Pair Auto Generation:
Reliability and Consistency Assessment for Large Language Models Applications
- Jyotirmoy Sundi

Evaluation/Testing of LLM Apps is a Pain
● Manual Testing
○ By a small set of users to test or outsource testing
○ Based on bugs in prod, developer updates prompts/extraction/llm chaining/ reasoning etc
○ Time-consuming and error-prone, leading to inaccurate results.
● Coverage
○ 100% coverage is hard on a large corpus, a big army of manual testers is needed
● High Variability in Output
○ based on user inputs, prompts, providers like openai, cohere, claude, google palm etc
● Reliability & Hallucinations
○ handling of different chat contexts and intents
○ handling turnkey questions when user suddenly asks about a new concept , previous chat message
topics/intents become useless
● Privacy
○ redact/anonymization of data/masking PII data before sending to a LLM provider endpoint
○ adhere to updated GDPR/CCPA/EU govt policies.

Advantages of Datacraft
● Automated question/answer data generation
○ Reduced manual testing
● Improved coverage
○ Not 100% but can be much higher based on your budget
● Reduced bias
○ Consistent set of ground truths to rank llm endpoints
● Enhanced testing efficiency
○ With a ground truth dataset evaluate at scale and quickly
● Consistency
○ Test consistency across any changes in chaining, prompts, rag , provider updates
● Ranking RAG responses systematically
○ Test multiple LLM apps based on prompts/providers/rag techniques to choose a winner before rolling out to customers

Overview of Datacraft
● Stratified sampling
○ Method for selecting samples from a diverse population by dividing it into subgroups or strata based on specific
characteristics
○ Ensures that our QA dataset accurately represents various types of questions with high coverage across the
corpus
● Verified QA prompts for various scenarios like blogs, readme, text files, and catalogs.
○ Curated Prompts are verified and tested to ensure they are effective in generating QA datasets
○ Addition of more prompts is easy
● Generation with context injection of sampled data & selected prompts from each strata
○ Using Language Models, we create questions and answers based on the sampled data.
○ Context Injection: We inject relevant context from the sampled documents into the generated QA pairs

Use Cases
● Question answering on any data source
○ PDFs / CSV / JSON / Text files
○ README.mk
○ Online Blogs
■ Imagine a user experience of reading blogs through the seamless integration of personalized curated
Q&A section, thoughtfully designed to facilitate easy navigation and comprehension of the content in the
blog, might lead to increased inquiry, engagement, conversions, signups etc.
○ API/SDK Docs
○ Databases
○ Commerce Catalogs
● Synthetic Dataset generation for any custom model development
○ AI/ML training of tabular or text data
○ Generate NER synthetic data for named entity recognition(NER) models to train on custom NERs or common
like name, credit card, and SSN or any private entities of a company
■ Help in data redaction/anonymization

Demo - https://github.com/sundi133/llm-datacraft

Similar to LLM-Datacraft.pdf

AI Builder Deepdive DynamicsPower! Brussels 2019Rebekka Aalbers-de Jong

Being a Data Science Product ManagerRam Narayan Subudhi

Agile Testing Process Analytics: From Data to Insightful InformationTechWell

AbbyBrownAB_ResumeAbby Brown

Open source ml systems that need to be builtNikhil Garg

[Webinar] Getting started with server-side testing - presented by WiderFunnel...Chris Goward

AI hype or realityAwantik Das

Moulika_QA_+4Yrs_TestingExpMoulika Dhanighonda

Getting Started with Server-Side TestingOptimizely

20160422 Speedy Framework Enterprise Application Development PlatformHarezmi IT Solutions

Aakanksha_Agnani_j2016Aakanksha Agnani

Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...Learnosity

Software testingNico Heidtke

SlackJeffrey Funk Business Models

Using the power of OpenAI with your own data: what's possible and how to start?Maxim Salnikov

Google Cloud Machine Learning India Quotient

Hooduku sugar crmhooduku

QA is not qualityAlex Wilson

Software Testing_ResumeHimani Goyal

Similar to LLM-Datacraft.pdf (20)

AI Builder Deepdive DynamicsPower! Brussels 2019

Being a Data Science Product Manager

Agile Testing Process Analytics: From Data to Insightful Information

AbbyBrownAB_Resume

Open source ml systems that need to be built

[Webinar] Getting started with server-side testing - presented by WiderFunnel...

AI hype or reality

Moulika_QA_+4Yrs_TestingExp

Getting Started with Server-Side Testing

20160422 Speedy Framework Enterprise Application Development Platform

Aakanksha_Agnani_j2016

Educate 2017: Neverending Story: Exploring Learnosity's ongoing product evolu...

Software testing

Slack

Using the power of OpenAI with your own data: what's possible and how to start?

Google Cloud Machine Learning

Hooduku sugar crm

QA is not quality

Software Testing_Resume

Recently uploaded

Halmar dropshipping via API with DroFxolyaivanovalion

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

April 2024 - Crypto Market Report's Analysismanisha194592

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Recently uploaded (20)

Halmar dropshipping via API with DroFx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Determinants of health, dimensions of health, positive health and spectrum of...

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Log Analysis using OSSEC sasoasasasas.pptx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Mature dropshipping via API with DroFx.pptx

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Ravak dropshipping via API with DroFx.pptx

100-Concepts-of-AI by Anupama Kate .pptx

April 2024 - Crypto Market Report's Analysis

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Carero dropshipping via API with DroFx.pptx

LLM-Datacraft.pdf

1. Question Answer Pair Auto Generation: Reliability and Consistency Assessment for Large Language Models Applications - Jyotirmoy Sundi

2. Evaluation/Testing of LLM Apps is a Pain ● Manual Testing ○ By a small set of users to test or outsource testing ○ Based on bugs in prod, developer updates prompts/extraction/llm chaining/ reasoning etc ○ Time-consuming and error-prone, leading to inaccurate results. ● Coverage ○ 100% coverage is hard on a large corpus, a big army of manual testers is needed ● High Variability in Output ○ based on user inputs, prompts, providers like openai, cohere, claude, google palm etc ● Reliability & Hallucinations ○ handling of different chat contexts and intents ○ handling turnkey questions when user suddenly asks about a new concept , previous chat message topics/intents become useless ● Privacy ○ redact/anonymization of data/masking PII data before sending to a LLM provider endpoint ○ adhere to updated GDPR/CCPA/EU govt policies.

3. Advantages of Datacraft ● Automated question/answer data generation ○ Reduced manual testing ● Improved coverage ○ Not 100% but can be much higher based on your budget ● Reduced bias ○ Consistent set of ground truths to rank llm endpoints ● Enhanced testing efficiency ○ With a ground truth dataset evaluate at scale and quickly ● Consistency ○ Test consistency across any changes in chaining, prompts, rag , provider updates ● Ranking RAG responses systematically ○ Test multiple LLM apps based on prompts/providers/rag techniques to choose a winner before rolling out to customers

4. Overview of Datacraft ● Stratified sampling ○ Method for selecting samples from a diverse population by dividing it into subgroups or strata based on specific characteristics ○ Ensures that our QA dataset accurately represents various types of questions with high coverage across the corpus ● Verified QA prompts for various scenarios like blogs, readme, text files, and catalogs. ○ Curated Prompts are verified and tested to ensure they are effective in generating QA datasets ○ Addition of more prompts is easy ● Generation with context injection of sampled data & selected prompts from each strata ○ Using Language Models, we create questions and answers based on the sampled data. ○ Context Injection: We inject relevant context from the sampled documents into the generated QA pairs

5. Use Cases ● Question answering on any data source ○ PDFs / CSV / JSON / Text files ○ README.mk ○ Online Blogs ■ Imagine a user experience of reading blogs through the seamless integration of personalized curated Q&A section, thoughtfully designed to facilitate easy navigation and comprehension of the content in the blog, might lead to increased inquiry, engagement, conversions, signups etc. ○ API/SDK Docs ○ Databases ○ Commerce Catalogs ● Synthetic Dataset generation for any custom model development ○ AI/ML training of tabular or text data ○ Generate NER synthetic data for named entity recognition(NER) models to train on custom NERs or common like name, credit card, and SSN or any private entities of a company ■ Help in data redaction/anonymization

6. Demo - https://github.com/sundi133/llm-datacraft

LLM-Datacraft.pdf

Recommended

Recommended

More Related Content

Similar to LLM-Datacraft.pdf

Similar to LLM-Datacraft.pdf (20)

Recently uploaded

Recently uploaded (20)

LLM-Datacraft.pdf