SlideShare a Scribd company logo
1 of 32
Download to read offline
Enterprise Grade Data Labeling
Design Your Ground Truth to Scale in Production
Jai Natarajan
jai@imerit.net
Obsession + Craft
Obsession + Craft
AI Production Pipeline
Data
collection
Data
Annotation
Model
training Deployment
Feedback Loop
Software 2.0
“A large portion of programmers of tomorrow … collect, clean,
manipulate, label, analyze and visualize data that feeds neural
networks."
Andrej Karpathy, Tesla
The data is an intrinsic part of the algorithm
Outcome depends as much on the data as on the code
TLDR: There are ways to be as mindful about your data
strategy as you are about your algorithm strategy
Algorithm Training is Algorithm Design
The Data Situation
Data Annotation Takes Time
Figure-Eight estimates 80% of
development time spent on Data Prep
and Labeling
Cognilytica estimates 25% of time spent
on Data Labeling
Data Annotation Needs Are Substantial
Automotive Customer
● 250 k – 500 k frames per month
● Average 10 objects/frame for object detection
● Average 45 mins per frame for full segmentation
● Multiple judgements (3-5) on each data piece
Medical Image Customer
● 200 k endoscopic scans
● Average 2 anomalies per scan
● Multiple judgements (3-5) on each data piece
Bounding
Boxes
Polygons
Segmentation
PanOptic
Segmentation
Tracking
LIDAR
MultiSensor
Fusion
Data Annotation is increasingly Complex
Simple Boxes
+Secs/ task
Precise
Boundaries on
some objects
+Mins/task
All objects
precisely
marked
+30mins/task
All objects
precisely
marked and
clubbed by
type
+45mins/task
Objects
marked and
tracked across
frames of
video
+30mins/task
Thousands of
points clubbed
into objects
+90mins/task
Combine
LIDAR and
images from
multiple angles
+90mins/task
Complex Subject Matter
Healthcare, finance, law
Jargon-Rich Domain
Image editing, e-commerce
(brand jargon)
SKILLED
GENERAL
Specific World Knowledge
Current events, fashion
General Knowledge
Travel AI assistant
SPECIALIZED
EXPERT
Diagnosis & Treatment
Clinical History, Epidemiology,
Contextual analysis
Classification
Pathophysiology, multiple
dependency decision tree
Identification
Anatomy & Physiology, Pattern
Recognition, Ontological
Understanding
Segmentation
Navigation & Tool Familiarity
DOMAIN
LABELING
Data Annotation involves Domains
Data Security and Audit Trail
Quality and Consistency
Custom Tooling and Insights
Domain Knowledge & Targeted Skilling
Retained Learnings across Iterations
The Case For Enterprise Annotation
Enterprise Annotation @ iMerit
iMerit is a tech-enabled data services company that leverages human intelligence in
data, content, and machine learning.
We deliver high-quality, managed services while effecting
positive social and economic change.
Our data experts work full-time onsite at our secure delivery facilities.
We are iMerit
24x7
operations
< 5%
attrition
9
centers
200 M+
data points
delivered
130+
clients
SOC 2
certified
2,600
employees
Annotation Specialties
Capture Video during game
Mark joint positions of pitcher
Build 3D skeleton for analytics
Expand to multiple teams
Extend to batters, fielders
HELPING CHICAGO CUBS WIN WORLD SERIES
• Street scenes for Autonomous Vehicles -Images + LiDAR
• Named Entitites/Salience in Financial Documents
• Aerial Imagery of healthy and diseased crops
• Peril Assessment for Property Insurance
• Identification of tumors and lesions in medical scans
• Risk Assessment of Power Assets
Experience and Expertise
Annotation Framework
TRAINING
EXPERT
CONSULTATION
FEEDBACK
CYCLE
WORKFLOW
CUSTOMIZATION
EVALUATION
Collaborative Framework
ML Engineer
Subject-matter Expert
Trainer
Use case
Edge case discovery
Task design (granularity,
cognitive load of task)
1. Expert Consultation
For generalists
Narrow and Deep
Example Rich, requires time to
train, practice, and iterate
2. Guidelines & Training
Data and QC Pipeline
UI optimizations
Crawl (calibration)
Walk (soft production-rapid feedback)
Run (production, internal QA)
Supports scale, ensures quality
3. Workflow Customization
Collaboration: SMEs, PM,
engineer, generalists
Insights into unanticipated
deviations
No penalty for challenging
assumptions
Improve model by identifying biases
Ensure reliability of annotations
4. Feedback Cycle
Key metrics & thresholds
Share responsibility
Test against gold set
Measure inter-rater reliability
Increase rigor over project life
Minimize rework iterations
Ensures quality
Validates assumptions
5. Evaluation
Good Annotation Design
Good Annotation Design: Context Matters
Person or Vehicle?
Good Annotation Design: Context Matters
Are you trying to avoid
hitting people or are you
counting vehicles?
Person or Vehicle?
Good Annotation Design: UI Matters
I want bounding boxes no
smaller than 1.5 cms. in
any dimension
Go for it !
iMerit Solution Architect + Customer
Expert:
Unpack the jargon
Create deep and narrow training
curriculum (docs, videos, video-
confs)
Retain learnings across time
Good Annotation Design: Domain Specific
Good Annotation Design: Allow Open Feedback
● Conversation around quality
Are some errors more important than other errors ?
How will you sample quality ?
● Safe space to Iterate without penalty
● Small discovery and calibration pilots
● Ask your labeling force to question edge cases
Summary – Mindful Data Annotation
Data strategy as mindful as your
algorithm strategy
● Ask the right questions
● Plan time and budget
● Plan for increased skill needs
● Partner with your annotation
team
● Create an environment where
insight is possible
● Build long term, secure, scalable
pipeline
Thank You!
jai@imerit.net

More Related Content

What's hot

Intel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew ParryIntel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew Parrymikeohara
 
940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptop940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptopRising Media, Inc.
 
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS cscpconf
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPeculium Crypto
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
 
Applications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud ProductsApplications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud ProductsLisa Cohen
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahidBigDataExpo
 
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Sri Ambati
 
How AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceHow AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceLucidworks
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdxThinkful
 
Hyf azure ml_1
Hyf azure ml_1Hyf azure ml_1
Hyf azure ml_1KatoK1
 

What's hot (11)

Intel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew ParryIntel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew Parry
 
940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptop940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptop
 
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedback
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
Applications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud ProductsApplications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud Products
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahid
 
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
 
How AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceHow AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee Experience
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdx
 
Hyf azure ml_1
Hyf azure ml_1Hyf azure ml_1
Hyf azure ml_1
 

Similar to Enterprise Grade Data Labeling: Design Your Ground Truth to Scale in Production

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaCapgemini
 
Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)AnandSRao1962
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionSkyl.ai
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Maxim Salnikov
 
Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025Lucidworks
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLPSkyl.ai
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)Michael King
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleMaxim Salnikov
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY
 
How AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform OrganizationsHow AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform OrganizationsSkyl.ai
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Pragmatic Enterprise Architecture
Pragmatic Enterprise ArchitecturePragmatic Enterprise Architecture
Pragmatic Enterprise ArchitectureE2 Partners
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Bhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogueBhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogueVijayananda Mohire
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleMartin Dvorak
 

Similar to Enterprise Grade Data Labeling: Design Your Ground Truth to Scale in Production (20)

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
 
Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
 
Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLP
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scale
 
Sumyag profile deck
Sumyag profile deck Sumyag profile deck
Sumyag profile deck
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
How AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform OrganizationsHow AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform Organizations
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Pragmatic Enterprise Architecture
Pragmatic Enterprise ArchitecturePragmatic Enterprise Architecture
Pragmatic Enterprise Architecture
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Bhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogueBhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogue
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI module
 

Recently uploaded

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Enterprise Grade Data Labeling: Design Your Ground Truth to Scale in Production

  • 1. Enterprise Grade Data Labeling Design Your Ground Truth to Scale in Production Jai Natarajan jai@imerit.net
  • 5. Software 2.0 “A large portion of programmers of tomorrow … collect, clean, manipulate, label, analyze and visualize data that feeds neural networks." Andrej Karpathy, Tesla
  • 6. The data is an intrinsic part of the algorithm Outcome depends as much on the data as on the code TLDR: There are ways to be as mindful about your data strategy as you are about your algorithm strategy Algorithm Training is Algorithm Design
  • 8. Data Annotation Takes Time Figure-Eight estimates 80% of development time spent on Data Prep and Labeling Cognilytica estimates 25% of time spent on Data Labeling
  • 9. Data Annotation Needs Are Substantial Automotive Customer ● 250 k – 500 k frames per month ● Average 10 objects/frame for object detection ● Average 45 mins per frame for full segmentation ● Multiple judgements (3-5) on each data piece Medical Image Customer ● 200 k endoscopic scans ● Average 2 anomalies per scan ● Multiple judgements (3-5) on each data piece
  • 10. Bounding Boxes Polygons Segmentation PanOptic Segmentation Tracking LIDAR MultiSensor Fusion Data Annotation is increasingly Complex Simple Boxes +Secs/ task Precise Boundaries on some objects +Mins/task All objects precisely marked +30mins/task All objects precisely marked and clubbed by type +45mins/task Objects marked and tracked across frames of video +30mins/task Thousands of points clubbed into objects +90mins/task Combine LIDAR and images from multiple angles +90mins/task
  • 11. Complex Subject Matter Healthcare, finance, law Jargon-Rich Domain Image editing, e-commerce (brand jargon) SKILLED GENERAL Specific World Knowledge Current events, fashion General Knowledge Travel AI assistant SPECIALIZED EXPERT Diagnosis & Treatment Clinical History, Epidemiology, Contextual analysis Classification Pathophysiology, multiple dependency decision tree Identification Anatomy & Physiology, Pattern Recognition, Ontological Understanding Segmentation Navigation & Tool Familiarity DOMAIN LABELING Data Annotation involves Domains
  • 12. Data Security and Audit Trail Quality and Consistency Custom Tooling and Insights Domain Knowledge & Targeted Skilling Retained Learnings across Iterations The Case For Enterprise Annotation
  • 14. iMerit is a tech-enabled data services company that leverages human intelligence in data, content, and machine learning. We deliver high-quality, managed services while effecting positive social and economic change. Our data experts work full-time onsite at our secure delivery facilities. We are iMerit 24x7 operations < 5% attrition 9 centers 200 M+ data points delivered 130+ clients SOC 2 certified 2,600 employees
  • 16. Capture Video during game Mark joint positions of pitcher Build 3D skeleton for analytics Expand to multiple teams Extend to batters, fielders HELPING CHICAGO CUBS WIN WORLD SERIES
  • 17. • Street scenes for Autonomous Vehicles -Images + LiDAR • Named Entitites/Salience in Financial Documents • Aerial Imagery of healthy and diseased crops • Peril Assessment for Property Insurance • Identification of tumors and lesions in medical scans • Risk Assessment of Power Assets Experience and Expertise
  • 20. ML Engineer Subject-matter Expert Trainer Use case Edge case discovery Task design (granularity, cognitive load of task) 1. Expert Consultation
  • 21. For generalists Narrow and Deep Example Rich, requires time to train, practice, and iterate 2. Guidelines & Training
  • 22. Data and QC Pipeline UI optimizations Crawl (calibration) Walk (soft production-rapid feedback) Run (production, internal QA) Supports scale, ensures quality 3. Workflow Customization
  • 23. Collaboration: SMEs, PM, engineer, generalists Insights into unanticipated deviations No penalty for challenging assumptions Improve model by identifying biases Ensure reliability of annotations 4. Feedback Cycle
  • 24. Key metrics & thresholds Share responsibility Test against gold set Measure inter-rater reliability Increase rigor over project life Minimize rework iterations Ensures quality Validates assumptions 5. Evaluation
  • 26. Good Annotation Design: Context Matters Person or Vehicle?
  • 27. Good Annotation Design: Context Matters Are you trying to avoid hitting people or are you counting vehicles? Person or Vehicle?
  • 28. Good Annotation Design: UI Matters I want bounding boxes no smaller than 1.5 cms. in any dimension Go for it !
  • 29. iMerit Solution Architect + Customer Expert: Unpack the jargon Create deep and narrow training curriculum (docs, videos, video- confs) Retain learnings across time Good Annotation Design: Domain Specific
  • 30. Good Annotation Design: Allow Open Feedback ● Conversation around quality Are some errors more important than other errors ? How will you sample quality ? ● Safe space to Iterate without penalty ● Small discovery and calibration pilots ● Ask your labeling force to question edge cases
  • 31. Summary – Mindful Data Annotation Data strategy as mindful as your algorithm strategy ● Ask the right questions ● Plan time and budget ● Plan for increased skill needs ● Partner with your annotation team ● Create an environment where insight is possible ● Build long term, secure, scalable pipeline