SlideShare a Scribd company logo
1 of 17
Download to read offline
Framework for semi-automated labeling for ground
truth residing in text allowing interactive expert
feedback
Tapan Shah
April 28, 2018
Outline
1. Introduction
2. Problem Motivation
3. Formulation and First-cut solution
4. Feedback types
5. Metrics and discussion
Introduction
1. Lead Scientist, GE Global Research
2. Interests: Time Series Analytics for PHM, massive imbalance in
large class problems, perception metrics, human feedback for model
improvement
3. PhD in signal processing from Tata Institute of Fundamental
Research
4. Thesis: Signal Processing for Low Precision quantization
5. B.Tech in Electronics from NIT Surat
Application
Figure: Schematic of workļ¬‚ow
Problem Motivation
1. We formulate a supervised multi-class classiļ¬cation problem for
automated troubleshooting
2. For part replacements, extracting label easy.
3. For non-part actions (reseating, adjustments, removals etc), not
obvious
4. Initially, we used expert-guided rules
4.1 Not scalable in terms of time and resources
4.2 No structure to conversation
4.3 Myopic view when assigning labels
Example
Service Request # Repair Action
1-198428217869 adjusted the linkage in the electri-
cal dock Contacted customer and
conļ¬rmed the system is down due
to electronic dock failure. Paged
FE for follow up.
1-186410196607 Assisted FE Pigg remove dock
and repairing broken wire going to
dock motor. found the switch con-
nector inside the table dock dis-
connected. reconnected the dock
switch.
Table: Examples of two Service Requests with similar "issues"
Our formulation
1. Formulated the label generation and assignment as Topic Modeling
problem
2. The topic modeling
2.1 Assigns each text document to a topic
2.2 Gives a set of important keywords characterizing each topic
2.3 Provides an initial seed point and a structure to communicate with
experts
Non-negative Matrix Factorization
1. Determine two matrices W āˆˆ RnƗk
and H āˆˆ RkƗm
such that
A ā‰ˆ WH.
2. These matrices are found by solving the following optimization
problem 1
:
min
W>0,H>0
||A āˆ’ WH||F ,
3. Solved using alternative non-negative least squares method (NNLS).
W ā† arg min
W>0
||A āˆ’ WH||F , H ā† arg min
H>0
||A āˆ’ WH||F . (1)
1l1/l2 Regularization terms can be added to control sparsity
NMF for topic modeling
NMF for topic modeling
Figure: Toy example showing use of NMF for topic modeling
Identifying number of topics
1. There are topic coherence based metrics [NLGB10], [AGH+
13] to
measure the performance for the topic modeling.
2. We found those metrics not suitable for our application. We deļ¬ne a
new metric ReconK
reconK = ||A āˆ’ WH||2
F + Ī» log k
where Ī» is Ļƒmin/Ļƒmax
3. Plot reconK as a function of k and chose the k corresponding to the
knee point
Incorporating expert feedback: Prior Work
1. The authors in [YPL+
14] allows for 2-3 types of user input for LDA
based topic modeling
2. The work closest to ours is [CLRP13]. The authors deļ¬ne 5 types of
user input and optimize a metric such that the updated W and H
matrices are close to the user feedback output
Incorporating expert feedback
Feedback type Description Mathematical update
Addition The expert is asked if all the ma-
jor "issues" or topics are covered.
If not, he is asked to provide them
along with associated keywords
Add a new column to W with 1s at
the location of corresponding key-
words.
Deletion The expert is asked if any of the
label is unimportant or unnecessary.
Delete the column in W.
Rename The primitive labels are renamed to
make them intuitive and informa-
tive
Create a simple mapping function
which maps the primitive label to
expert label.
Keyword modiļ¬-
cation
For each label, the expert can re-
weigh the importance of the key-
words associated with that label.
The re-weigh can be binary or soft
scoring
For each column in the matrix W,
the corresponding weight of the
word that was re-weighed is mod-
iļ¬ed.
Merging The expert is asked if two or more
labels are similar and can be merged
into a single label
In matrix W, the two columns to be
merged are removed replaced by a
single column which is the weighted
sum of the two deleted columns.
Splitting The expert can suggest if any par-
ticular label is too generic and
should be split into multiple labels.
If yes, those labels along with the
associated keywords for each split
label are recorded.
The corresponding column is re-
moved and replaced by two columns
with 1s at the locations of the cor-
responding keywords.
Table: Formal mechanism for expert feedback
Modiļ¬ed objective function to incorporate feedback
1. In order to maintain continuity and coherence, we do not want the
original matrix W to change a lot (This is very important from a
practical standpoint)
min
W>0,H>0
||Aāˆ’WH||2
F +Ī²1||W āˆ’Wfeedback ||2
F +Ī²2||Wreduced āˆ’Wold ||2
F ,
Metric
1. We share a spreadsheet with text and the associated keyword and
label (Label is most imp noun and verb in keyword set) with >2
experts.
2. The expert is asked to provide feedback in a structure as deļ¬ned
above.
3. We compute the disagreement index among the experts and
machine generated labels.
disagreement(j, k) =
1
n
i
I(expertj = expertk )
4. Comparing maxk disagreement(j, k) for j = expertmachine with
minj,k disagreement(j, k) for expertj , expertk āˆˆ experts.
Discussion
1. We got a maximum machine disagreement of 0.4 compared to
minimum expert disagreement of 0.25
2. Post-feedback, the machine disagreement was reduced to 0.3
Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur
Moitra, David Sontag, Yichen Wu, and Michael Zhu.
A practical algorithm for topic modeling with provable guarantees.
In International Conference on Machine Learning, pages 280ā€“288,
2013.
Jaegul Choo, Changhyun Lee, Chandan K Reddy, and Haesun Park.
Utopian: User-driven topic modeling based on interactive
nonnegative matrix factorization.
IEEE transactions on visualization and computer graphics,
19(12):1992ā€“2001, 2013.
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin.
Automatic evaluation of topic coherence.
In Human Language Technologies: The 2010 Annual Conference of
the North American Chapter of the Association for Computational
Linguistics, pages 100ā€“108. Association for Computational
Linguistics, 2010.
Yi Yang, Shimei Pan, Jie Lu, Mercan Topkara, and Doug Downey.
Incorporating user input with topic modeling.
In CIKM 2014 Workshop on Interactive Mining for Big Data
(ImBigā€™14), 2014.

More Related Content

Recently uploaded

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationRadu Cotescu
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĆŗjo
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
Ā 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
Ā 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Ā 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
Ā 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ā 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Ā 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 

Featured

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
Ā 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
Ā 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
Ā 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
Ā 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
Ā 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
Ā 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
Ā 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceChristy Abraham Joy
Ā 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
Ā 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
Ā 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
Ā 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
Ā 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
Ā 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slidesAlireza Esmikhani
Ā 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
Ā 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
Ā 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
Ā 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
Ā 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Ā 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Ā 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Ā 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
Ā 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Ā 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Ā 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Ā 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Ā 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Ā 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Ā 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
Ā 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Ā 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Ā 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
Ā 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Ā 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Ā 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ā 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Ā 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Ā 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Ā 

Presentation anthill

  • 1. Framework for semi-automated labeling for ground truth residing in text allowing interactive expert feedback Tapan Shah April 28, 2018
  • 2. Outline 1. Introduction 2. Problem Motivation 3. Formulation and First-cut solution 4. Feedback types 5. Metrics and discussion
  • 3. Introduction 1. Lead Scientist, GE Global Research 2. Interests: Time Series Analytics for PHM, massive imbalance in large class problems, perception metrics, human feedback for model improvement 3. PhD in signal processing from Tata Institute of Fundamental Research 4. Thesis: Signal Processing for Low Precision quantization 5. B.Tech in Electronics from NIT Surat
  • 5. Problem Motivation 1. We formulate a supervised multi-class classiļ¬cation problem for automated troubleshooting 2. For part replacements, extracting label easy. 3. For non-part actions (reseating, adjustments, removals etc), not obvious 4. Initially, we used expert-guided rules 4.1 Not scalable in terms of time and resources 4.2 No structure to conversation 4.3 Myopic view when assigning labels
  • 6. Example Service Request # Repair Action 1-198428217869 adjusted the linkage in the electri- cal dock Contacted customer and conļ¬rmed the system is down due to electronic dock failure. Paged FE for follow up. 1-186410196607 Assisted FE Pigg remove dock and repairing broken wire going to dock motor. found the switch con- nector inside the table dock dis- connected. reconnected the dock switch. Table: Examples of two Service Requests with similar "issues"
  • 7. Our formulation 1. Formulated the label generation and assignment as Topic Modeling problem 2. The topic modeling 2.1 Assigns each text document to a topic 2.2 Gives a set of important keywords characterizing each topic 2.3 Provides an initial seed point and a structure to communicate with experts
  • 8. Non-negative Matrix Factorization 1. Determine two matrices W āˆˆ RnƗk and H āˆˆ RkƗm such that A ā‰ˆ WH. 2. These matrices are found by solving the following optimization problem 1 : min W>0,H>0 ||A āˆ’ WH||F , 3. Solved using alternative non-negative least squares method (NNLS). W ā† arg min W>0 ||A āˆ’ WH||F , H ā† arg min H>0 ||A āˆ’ WH||F . (1) 1l1/l2 Regularization terms can be added to control sparsity
  • 9. NMF for topic modeling
  • 10. NMF for topic modeling Figure: Toy example showing use of NMF for topic modeling
  • 11. Identifying number of topics 1. There are topic coherence based metrics [NLGB10], [AGH+ 13] to measure the performance for the topic modeling. 2. We found those metrics not suitable for our application. We deļ¬ne a new metric ReconK reconK = ||A āˆ’ WH||2 F + Ī» log k where Ī» is Ļƒmin/Ļƒmax 3. Plot reconK as a function of k and chose the k corresponding to the knee point
  • 12. Incorporating expert feedback: Prior Work 1. The authors in [YPL+ 14] allows for 2-3 types of user input for LDA based topic modeling 2. The work closest to ours is [CLRP13]. The authors deļ¬ne 5 types of user input and optimize a metric such that the updated W and H matrices are close to the user feedback output
  • 13. Incorporating expert feedback Feedback type Description Mathematical update Addition The expert is asked if all the ma- jor "issues" or topics are covered. If not, he is asked to provide them along with associated keywords Add a new column to W with 1s at the location of corresponding key- words. Deletion The expert is asked if any of the label is unimportant or unnecessary. Delete the column in W. Rename The primitive labels are renamed to make them intuitive and informa- tive Create a simple mapping function which maps the primitive label to expert label. Keyword modiļ¬- cation For each label, the expert can re- weigh the importance of the key- words associated with that label. The re-weigh can be binary or soft scoring For each column in the matrix W, the corresponding weight of the word that was re-weighed is mod- iļ¬ed. Merging The expert is asked if two or more labels are similar and can be merged into a single label In matrix W, the two columns to be merged are removed replaced by a single column which is the weighted sum of the two deleted columns. Splitting The expert can suggest if any par- ticular label is too generic and should be split into multiple labels. If yes, those labels along with the associated keywords for each split label are recorded. The corresponding column is re- moved and replaced by two columns with 1s at the locations of the cor- responding keywords. Table: Formal mechanism for expert feedback
  • 14. Modiļ¬ed objective function to incorporate feedback 1. In order to maintain continuity and coherence, we do not want the original matrix W to change a lot (This is very important from a practical standpoint) min W>0,H>0 ||Aāˆ’WH||2 F +Ī²1||W āˆ’Wfeedback ||2 F +Ī²2||Wreduced āˆ’Wold ||2 F ,
  • 15. Metric 1. We share a spreadsheet with text and the associated keyword and label (Label is most imp noun and verb in keyword set) with >2 experts. 2. The expert is asked to provide feedback in a structure as deļ¬ned above. 3. We compute the disagreement index among the experts and machine generated labels. disagreement(j, k) = 1 n i I(expertj = expertk ) 4. Comparing maxk disagreement(j, k) for j = expertmachine with minj,k disagreement(j, k) for expertj , expertk āˆˆ experts.
  • 16. Discussion 1. We got a maximum machine disagreement of 0.4 compared to minimum expert disagreement of 0.25 2. Post-feedback, the machine disagreement was reduced to 0.3
  • 17. Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. A practical algorithm for topic modeling with provable guarantees. In International Conference on Machine Learning, pages 280ā€“288, 2013. Jaegul Choo, Changhyun Lee, Chandan K Reddy, and Haesun Park. Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE transactions on visualization and computer graphics, 19(12):1992ā€“2001, 2013. David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 100ā€“108. Association for Computational Linguistics, 2010. Yi Yang, Shimei Pan, Jie Lu, Mercan Topkara, and Doug Downey. Incorporating user input with topic modeling. In CIKM 2014 Workshop on Interactive Mining for Big Data (ImBigā€™14), 2014.