SlideShare a Scribd company logo
Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan
Outline Inconsistency Robustness is a multi-disciplinary issue.  We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency
Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.
Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data
Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG
Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model
Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus
Rubens et al, AJS 2011
If there is no inconsistency between the training and testing data then  the most complex model would tend be selected.
Change Detection / Model Correction  Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg
Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

More Related Content

What's hot

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
Abdel Salam Sayyad
 
Past and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisPast and Future of Software Testing and Analysis
Past and Future of Software Testing and Analysis
Lionel Briand
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
Spotle.ai
 
Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineering
sarfraznawaz
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testing
jfrchicanog
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?
Daniel Mendez
 
Spreadsheet Errors John Park
Spreadsheet  Errors  John ParkSpreadsheet  Errors  John Park
Spreadsheet Errors John ParkJohn Park
 
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Mohinder Dick, PMP
 
Ml part2
Ml part2Ml part2
Ml part2
Leon Gladston
 
Novice e-ass
Novice e-assNovice e-ass
Novice e-ass
multiermedia
 
What is Gate exam
What is Gate examWhat is Gate exam
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theory
CSIRO
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
Luis Borbon
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
Report
ReportReport
Reportbutest
 
Novice vp2
Novice vp2Novice vp2
Novice vp2
multiermedia
 
SVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - RowanSVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - Rowanthe nciia
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
alessio_ferrari
 
Using a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and SoftwareUsing a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and Software
Doug Holton
 
Predicting students performance in final examination
Predicting students performance in final examinationPredicting students performance in final examination
Predicting students performance in final examination
Rashid Ansari
 

What's hot (20)

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
Past and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisPast and Future of Software Testing and Analysis
Past and Future of Software Testing and Analysis
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineering
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testing
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?
 
Spreadsheet Errors John Park
Spreadsheet  Errors  John ParkSpreadsheet  Errors  John Park
Spreadsheet Errors John Park
 
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
 
Ml part2
Ml part2Ml part2
Ml part2
 
Novice e-ass
Novice e-assNovice e-ass
Novice e-ass
 
What is Gate exam
What is Gate examWhat is Gate exam
What is Gate exam
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theory
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Report
ReportReport
Report
 
Novice vp2
Novice vp2Novice vp2
Novice vp2
 
SVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - RowanSVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - Rowan
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Using a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and SoftwareUsing a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and Software
 
Predicting students performance in final examination
Predicting students performance in final examinationPredicting students performance in final examination
Predicting students performance in final examination
 

Similar to Inconsistent Outliers

Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
ijcnes
 
Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...
Sonia Whiteley
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
Allame Tabatabaei
 
Data wrangling week 9
Data wrangling week 9Data wrangling week 9
Data wrangling week 9
Ferdin Joe John Joseph PhD
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docx
ssuserf9c51d
 
E bay amplify_final
E bay amplify_finalE bay amplify_final
E bay amplify_final
Maria Stone
 
A Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response TheoryA Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response Theory
OpenThink Labs
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
eSAT Publishing House
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
eSAT Journals
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning Work
iTrainMalaysia1
 
Lime
LimeLime
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdf
Anuj Bhatia
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
Sri Ambati
 
MLF-2.pptx
MLF-2.pptxMLF-2.pptx
MLF-2.pptx
DevarapalliVamsi1
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
JishanAhmed24
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
manaswidebbarma1
 

Similar to Inconsistent Outliers (20)

Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
Data wrangling week 9
Data wrangling week 9Data wrangling week 9
Data wrangling week 9
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docx
 
E bay amplify_final
E bay amplify_finalE bay amplify_final
E bay amplify_final
 
A Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response TheoryA Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response Theory
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning Work
 
Lime
LimeLime
Lime
 
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdf
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
MLF-2.pptx
MLF-2.pptxMLF-2.pptx
MLF-2.pptx
 
Calibration of weights in surveys with nonresponse and frame imperfections
Calibration of weights in surveys with nonresponse and frame imperfectionsCalibration of weights in surveys with nonresponse and frame imperfections
Calibration of weights in surveys with nonresponse and frame imperfections
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 

More from Neil Rubens

Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]
Neil Rubens
 
Collaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for CorporationsCollaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for Corporations
Neil Rubens
 
Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]
Neil Rubens
 
Solving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model ProblemSolving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model Problem
Neil Rubens
 
Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)
Neil Rubens
 
ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014Neil Rubens
 
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Neil Rubens
 
e-learning 3.0 and AI
e-learning 3.0 and AIe-learning 3.0 and AI
e-learning 3.0 and AINeil Rubens
 
Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0
Neil Rubens
 
Active Learning in Recommender Systems
Active Learning in Recommender SystemsActive Learning in Recommender Systems
Active Learning in Recommender Systems
Neil Rubens
 
Outliers and Inconsistency
Outliers and InconsistencyOutliers and Inconsistency
Outliers and Inconsistency
Neil Rubens
 
Alumni Network Analysis
Alumni Network AnalysisAlumni Network Analysis
Alumni Network AnalysisNeil Rubens
 
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Neil Rubens
 
Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)
Neil Rubens
 
Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)
Neil Rubens
 
Japan Mobile
Japan MobileJapan Mobile
Japan Mobile
Neil Rubens
 

More from Neil Rubens (16)

Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]
 
Collaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for CorporationsCollaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for Corporations
 
Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]
 
Solving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model ProblemSolving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model Problem
 
Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)
 
ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014
 
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
 
e-learning 3.0 and AI
e-learning 3.0 and AIe-learning 3.0 and AI
e-learning 3.0 and AI
 
Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0
 
Active Learning in Recommender Systems
Active Learning in Recommender SystemsActive Learning in Recommender Systems
Active Learning in Recommender Systems
 
Outliers and Inconsistency
Outliers and InconsistencyOutliers and Inconsistency
Outliers and Inconsistency
 
Alumni Network Analysis
Alumni Network AnalysisAlumni Network Analysis
Alumni Network Analysis
 
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
 
Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)
 
Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)
 
Japan Mobile
Japan MobileJapan Mobile
Japan Mobile
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

Inconsistent Outliers

  • 1. Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan
  • 2. Outline Inconsistency Robustness is a multi-disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency
  • 3. Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.
  • 4. Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data
  • 5. Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG
  • 6. Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model
  • 7. Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus
  • 8.
  • 9.
  • 10.
  • 11. Rubens et al, AJS 2011
  • 12.
  • 13. If there is no inconsistency between the training and testing data then the most complex model would tend be selected.
  • 14. Change Detection / Model Correction Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg
  • 15. Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

Editor's Notes

  1. Hello. First of all, I would like to apologize for not being here in person; but I hope to join discussions about Inconsistency Robustness through online means.In my presentation I would like to talk about relations between Inconsistency and Outliers.
  2. As could be seen from the symposium’s program the issue of Inconsistency Robustness is rather multi-disciplinary. Let me discuss some of its aspects from the Machine Learning perspective. More specifically I would like to express my views about what is inconsistency, whether it could be useful and how it could be measured.
  3. In Machine Learning we typically refer to inconsistent points as outliers. Typically, we try to construct a model that is able to fits well the data that we have. The points that do not fit the model are typically considered to be an outlier.I think this cartoon captures very well the essence of the outliers. The outlier piont says that our model/or theory is not correct. On the other hand we consider outliers to be some erroneous or atypical data and tend to discard it.
  4. We can separate outlier into two classes.In the case of Spatial Outlier, the point is considered to be an outlier if it is distant from other points.In the case of Model Outlier, an outlier is a point whose label is different from the model’s expectations.In this talk we will focus on the model outliers.
  5. Outliers can occur due to a variety of causesOutlier could be a Faulty Data caused by the data entry error, or a measurement malfunctionThen there are outliers that occur by chance due to some natural deviationFinally outliers may be due to the incorrect assumptions that we make about the underlying model
  6. When encountering an outlierit is often assumed that current hypothesis/model is reasonably accurate for most of the points, and is inaccurate for just a few outliers. Therefore using outliers is considered to lead the learning process astray towards tuning the model for some incorrect or uncommon cases and therefore making it less accurate for the majority of the points. So outliers are typically discardedWe often get attached to our models/theories and tend to downplay or disregard data does not agree with it.
  7. But we must also consider the other possibility; That the data is right; and the model is wrong In which case the model needs to be changed and corrected
  8. Let us discuss setting in which outlier points could be very useful for learning.Consider that we have many points and we want to learn which points are orange and which points are blue. This could be problem of predicting which movie you like, whether webpage is relevant to your query, which treatment should be prescribed, etc. Typical approach is simply to get a lot of data and then to learn from it. However in many settings obtaining data could be costly e.g. if we want to discover effective treatment of adisease we may have to try out many compounds and that costs a lot of money and effort. If I want to learn about your preferences for movies, I would I need to ask you which movies you like and which ones you don’t; but that takes time and effort and many people are able to provide only a few ratings.So since data is costly we want to obtain data that is most informative and useful.
  9. So to learn the underlying colorings we can obtain a few samples, that is we select the points that we are interested in and their color is revealed.Lets say we have obtained a couple of points already. There could be a number of hypothesis/decision (shown by dashed lines) that are consistent with these points; i.e. points on one side of the line are blue and on the other side are orange. Then when predicting the color of the points we have to select one of the hypotheses and to hope that it is the correct one.
  10. Lets consider that we are now allowed to get another sample. We can choose a sample that is consistent with all of the hypothesis; i.e. all of the hypothesis assign the same color to it. Not surprisingly when the color of the point is revealed it is blue. This might seem like a good thing, but unfortunately it does not allow to reduce the number of hypothesis so that we can find the correct one. On the other hand we can choose an inconsistent point for which part of the hypothesis assign blue color and the other one orange. After the color of the point is revealed we can get rid of the hypothesis that got it wrong; and get closer to finding the right hypothesis.
  11. I would like to make another argument in support of outliers being informative.There is a very interesting phrase by Gregory Bateson that defines information as a difference that makes a difference. Outliers fit the viewpoint of information very well.Outliers are different from the rest of the points by definition.And including outliers in the learning process will make a difference on the model’s predictionsThe intuition behind this principle is thatThe only way that model’s prediction will improve, is if they will change.However, not all of changes are good; so the tricky part is to determine when the change is for the better and when it is not.
  12. Let me briefly mention relation between inconsistency and model complexity.As the number of training point increases more complex models tend to fit data better. e.g. When we have just two points linear model fits the data very well; if we add another point a linear model may no longer be complex enough to fit the data, so we may need to use a polynomial model of order 2; and then as we add more points increasing complex models may be neededAn important implication of that being that as we learn more and more the underlying model is likely to change and to become increasingly complex.
  13. The problem with simply increasing the model’s complexity is that the model that is too complex may start overfitting to the data, e.g. learning noise and not the signal. So allowing for some inconsistency could be good; models that do exceptionally well on some data may actually start to memorize it instead of learning it.So having some inconsistency between training and testing data could actually prevent us from making model more complex than necessary.
  14. The initial learned model could be accurate; but as the time progressed the underlying process may have started to change; e.g. we saw some drastic changes in the stock pricing models these past two weeks. So when we encounter inconsistent data we should not discard it as noise, but try to see if it could be indicative of our current model being incorrect and if possible try to correct it.
  15. In Conclusion, I hope that I was able to show that sometimes inconsistency could actually be rather useful for such things asHypothesis Learning, Model Selection and Model Correction.Thank You.