Inconsistent Outliers

•Download as PPTX, PDF•

1 like•1,848 views

Neil Rubens

Outliers and Inconsistency at Inconsistency Robustness Symposium 2011 at Stanford University.

Outline Inconsistency Robustness is a multi-disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency

Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.

Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data

Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG

Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model

Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus

If there is no inconsistency between the training and testing data then the most complex model would tend be selected.

Change Detection / Model Correction Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg

Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

Search-Based Software Engineering is now a mature area with numerous techniques developed to tackle some of the most challenging software engineering problems, from requirements to design, testing, fault localisation, and automated program repair. SBSE techniques have shown promising results, giving us hope that one day it will be possible for the tedious and labour intensive parts of software development to be completely automated, or at least semi-automated. In this talk, I will focus on the problem of objective performance evaluation of SBSE techniques. To this end, I will introduce Instance Space Analysis (ISA), which is an approach to identify features of SBSE problems that explain why a particular instance is difficult for an SBSE technique. ISA can be used to examine the diversity and quality of the benchmark datasets used by most researchers, and analyse the strengths and weaknesses of existing SBSE techniques. The instance space is constructed to reveal areas of hard and easy problems, and enables the strengths and weaknesses of the different SBSE techniques to be identified. I will present on how ISA enabled us to identify the strengths and weaknesses of SBSE techniques in two areas: Search-Based Software Testing and Automated Program Repair. Finally, I will end my talk with future directions of the objective assessment of SBSE techniques.

Empirical Software Engineering for Software Environments - University of Cali...

Marco Aurelio Gerosa

Supervised learning

O. R. Kumaran

AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...

csandit

Software testing is the primary phase, which is performed during software development and it is carried by a sequence of instructions of test inputs followed by expected output. The Harmony Search (HS) algorithm is based on the improvisation process of music. In comparison to other algorithms, the HSA has gain popularity and superiority in the field of evolutionary computation. When musicians compose the harmony through different possible combinations of the music, at that time the pitches are stored in the harmony memory and the optimization can be done by adjusting the input pitches and generate the perfect harmony. The test case generation process is used to identify test cases with resources and also identifies critical domain requirements. In this paper, the role of Harmony search meta-heuristic search technique is analyzed in generating random test data and optimized those test data. Test data are generated and optimized by applying in a case study i.e. a withdrawal task in Bank ATM through Harmony search. It is observed that this algorithm generates suitable test cases as well as test data and gives brief details about the Harmony search method. It is used for test data generation and optimization

Item Response Theory (IRT) is a paradigm within the field of Educational Psychometrics, that is used to assess student ability and test question difficulty and discrimination power. IRT has recently been applied to evaluate machine learning algorithm performance on a classification dataset. Here, we present a modified IRT-based framework for evaluating a portfolio of algorithms across a repository of datasets, while eliciting a suite of richer characteristics such as stability, effectiveness and anomalousness, that describe different aspects of algorithm performance.

Machine learning - session 2

Luis Borbon

Top 10 Data Science Practitioner Pitfalls

Sri Ambati

Reportbutest

Novice vp2

multiermedia

SVTL 2011 - 11 - Rowanthe nciia

Qualitative Studies in Software Engineering - Interviews, Observation, Ground...

alessio_ferrari

This Lecture about qualitative data collection methods and qualitative data analysis in software engineering. Topics covered are: 1. Sampling 2. Interviews 3. Observation and Participant Observation 4. Archival Data Collection 5. Grounded theory, Coding, Thematic Analysis 6. Threats to validity in qualitative studies Find the videos at: https://www.youtube.com/playlist?list=PLSKM4VZcJjV-P3fFJYMu2OhlTjEr9Bjl0

Using a Concept Inventory to Inform the Design of Instruction and Software

Doug Holton

Predicting students performance in final examination

Rashid Ansari

Lecture 9: Machine Learning in Practice (2)

Marina Santini

Module 4: Model Selection and Evaluation

Sara Hooker

Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good. Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data. To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.

What's hot

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...

Abdel Salam Sayyad

Past and Future of Software Testing and Analysis

Lionel Briand

Supervised and Unsupervised Machine Learning

Spotle.ai

Empirical research methods for software engineering

sarfraznawaz

On the application of SAT solvers for Search Based Software Testing

jfrchicanog

Empirical Software Engineering - What is it and why do we need it?

Daniel Mendez

Spreadsheet Errors John ParkJohn Park

Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...

Mohinder Dick, PMP

Ml part2

Leon Gladston

Novice e-ass

multiermedia

What is Gate exam

Amit Kumar , Jaipur Engineers

Algorithm evaluation using item response theory

CSIRO

Machine learning - session 2

Luis Borbon

Top 10 Data Science Practitioner Pitfalls

Sri Ambati

Reportbutest

Novice vp2

multiermedia

SVTL 2011 - 11 - Rowanthe nciia

Qualitative Studies in Software Engineering - Interviews, Observation, Ground...

alessio_ferrari

Using a Concept Inventory to Inform the Design of Instruction and Software

Doug Holton

Predicting students performance in final examination

Rashid Ansari

What's hot (20)

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...

Past and Future of Software Testing and Analysis

Supervised and Unsupervised Machine Learning

Empirical research methods for software engineering

On the application of SAT solvers for Search Based Software Testing

Empirical Software Engineering - What is it and why do we need it?

Spreadsheet Errors John Park

Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...

Ml part2

Novice e-ass

What is Gate exam

Algorithm evaluation using item response theory

Machine learning - session 2

Top 10 Data Science Practitioner Pitfalls

Report

Novice vp2

SVTL 2011 - 11 - Rowan

Qualitative Studies in Software Engineering - Interviews, Observation, Ground...

Using a Concept Inventory to Inform the Design of Instruction and Software

Predicting students performance in final examination

Similar to Inconsistent Outliers

Lecture 9: Machine Learning in Practice (2)

Marina Santini

Module 4: Model Selection and Evaluation

Sara Hooker

Machine Learning Approaches and its Challenges

ijcnes

Real world data sets considerably is not in a proper manner. They may lead to have incomplete or missing values. Identifying a missed attributes is a challenging task. To impute the missing data, data preprocessing has to be done. Data preprocessing is a data mining process to cleanse the data. Handling missing data is a crucial part in any data mining techniques. Major industries and many real time applications hardly worried about their data. Because loss of data leads the company growth goes down. For example, health care industry has many datas about the patient details. To diagnose the particular patient we need an exact data. If these exist missing attribute values means it is very difficult to retain the datas. Considering the drawback of missing values in the data mining process, many techniques and algorithms were implemented and many of them not so efficient. This paper tends to elaborate the various techniques and machine learning approaches in handling missing attribute values and made a comparative analysis to identify the efficient method.

Total Survey Error & Institutional Research: A case study of the University E...

Sonia Whiteley

Total Survey Error (TSE) is a component of Total Survey Quality (TSQ) that supports the assessment of the extent to which a survey is ‘fit-for-purpose’. While TSQ looks at a number of dimensions, such as relevance, credibility and accessibility, TSE is has a more operational focus on accuracy and minimising errors. Mitigating survey error involves finding a balance between a achieving a survey with minimal error and a survey that is affordable. It is also often the case that addressing one source of error can inadvertently increase another source of error. TSE provides a conceptual framework for evaluating the design of the University Experience Survey (UES) and offers a structured approach to making decisions about changing and enhancing the UES to support continuous improvement. The implications of TSE for institutional research are discussed using the UES as a case study.

Irt assessment

Allame Tabatabaei

Data wrangling week 9

Ferdin Joe John Joseph PhD

AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...

ijsc

Artificial Intelligence and Machine Learning have been around for a long time. In recent years, there has been a surge in popularity for applications integrating AI and ML technology. As with traditional development, software testing is a critical component of a successful AI/ML application. The development methodology used in AI/ML contrasts significantly from traditional development. In light of these distinctions, various software testing challenges arise. The emphasis of this paper is on the challenge of effectively splitting the data into training and testing data sets. By applying a k-Means clustering strategy to the data set followed by a decision tree, we can significantly increase the likelihood of the training data set to represent the domain of the full dataset and thus avoid training a model that is likely to fail because it has only learned a subset of the full data domain.

Technology-based assessments-special educationNew technologies r.docx

ssuserf9c51d

Technology-based assessments-special education New technologies remain competitive in driving efforts to make learning more efficient. Technology-based assessment in special education has made quite some advancement (Goldsmith & LeBlanc, 2004). First applications of computer technology assessment were for the scoring student's test forms. Currently, features incorporate self-administration, software control in presentation, response evaluation based on algorithms, prescription based on expert knowledge and direct links in assessment and change in instructions. The technology-based assessment uses electronic and software systems to evaluate individual children in an educational setting. Traditional assessments employ approaches of the computer. Video-based computer assisted test enabled learning of language for the student automatically increasing the validity of measurements. Video segments incorporated movie elements of moral dilemma in problem-solving tests. Students viewing the video segments respond by simply touching the screen. Innovative approaches have created relevance in testing procedures. Misplaced students result into poor results and get prompted to drop out. Teachers not well trained contribute to the misplacement due to poor management of certain behaviors and learning differences. For effect, teachers must be able to analyze data produced by the assessment and develop a due course of action. In addressing students with physical limitations use of voice recognition, handwriting interpreters, stylus tools, and touchscreen enables communication without the use of keys (Gierach, 2009). New software features allow students to perform comfortable pace of video segments on preferred language options. Computers are linked to videodisc enabling students to learn according to individual needs and skills. Latest technological features concern evaluation. Technological advancements assess social competence among students. The evaluator views students in a variety of context. Limitation in technology infrastructure, seen as the key barrier in this sort of assessment. Many district schools lack adequate high-speed broadband access necessary for this evaluation. Moreover, obsolesce in technology-based assessment erodes the capacity to provide quality services technology-based systems have a relatively short functional life. Holistic assessments are the best in technology-based assessments. They incorporate software control in presentation, conceptual models or algorithms, decision-making based rules and expert knowledge (Redecker, & Johannessen, 2013). Proliferation technology helps students in the inclusion of speech recognition, electronic communication, personal computers, robotics and artificial intelligence. Trends in technology-based assessments have impacted lives of students with a disability. They achieve school improvement goals as well as tracking student growth and progress. Current assessment norms have embedded current stan ...

E bay amplify_final

Maria Stone

A Non-Technical Approach for Illustrating Item Response Theory

OpenThink Labs

An overview on data mining designed for imbalanced datasets

eSAT Publishing House

An overview on data mining designed for imbalanced datasets

eSAT Journals

Abstract The imbalanced datasets with the classifying categories are not around equally characterized. A problem in imbalanced dataset occurs in categorization, where the amount of illustration of single class will be greatly lesser than the illustrations of the previous classes. Current existence brought improved awareness during implementation of machine learning methods to complex real world exertion, which is considered by several through imbalanced data. In machine learning the imbalanced datasets has become a critical problem and also usually found in many implementation such as detection of fraudulent calls, bio-medical, engineering, remote-sensing, computer society and manufacturing industries. In order to overcome the problems several approaches have been proposed. In this paper a study on Imbalanced dataset problem and examine various sampling method utilized in favour of evaluation of the datasets, moreover the interpretation methods are further suitable for imbalanced datasets mining. Keywords: Imbalance Problems, Imbalanced datasets, sampling strategies, Machine Learning.

Learn How to Make Machine Learning Work

iTrainMalaysia1

Machine learning is the medium in which we adopt intelligence into our systems and services today. Despite the spread of successful machine learning applications we still find that there are serious challenges faced when one decides to embrace this technology. In this webinar, we will learn about the fundamentals of build a successful machine learning project. You will be able to understand the important aspects of developing functioning and sustainable intelligence.

Lime

Daniel LIAO

Missing data and non response pdf

Anuj Bhatia

Top 10 Data Science Practioner Pitfalls - Mark Landry

Sri Ambati

Over-fitting, misread data, NAs, collinear column elimination and other common issues play havoc in the day of practicing data scientist. In this talk, we review top 10 common pitfalls and steps to avoid them. #h2ony - Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai - To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata

MLF-2.pptx

DevarapalliVamsi1

Calibration of weights in surveys with nonresponse and frame imperfectionsEUSTAT - Euskal Estatistika Erakundea - Instituto Vasco de Estadística

STAT7440StudentIMLPresentationJishan.pptx

JishanAhmed24

Analysing & interpreting data.ppt

manaswidebbarma1

Similar to Inconsistent Outliers (20)

Lecture 9: Machine Learning in Practice (2)

Module 4: Model Selection and Evaluation

Machine Learning Approaches and its Challenges

Total Survey Error & Institutional Research: A case study of the University E...

Irt assessment

Data wrangling week 9

AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...

Technology-based assessments-special educationNew technologies r.docx

E bay amplify_final

A Non-Technical Approach for Illustrating Item Response Theory

An overview on data mining designed for imbalanced datasets

Learn How to Make Machine Learning Work

Lime

Missing data and non response pdf

Top 10 Data Science Practioner Pitfalls - Mark Landry

MLF-2.pptx

Calibration of weights in surveys with nonresponse and frame imperfections

STAT7440StudentIMLPresentationJishan.pptx

Analysing & interpreting data.ppt

More from Neil Rubens

Autism: Survey of Emerging Approaches [Clinical]

Neil Rubens

Collaborative Robotics (CoBot): Opportunities for Corporations

Neil Rubens

Autism: Survey of Emerging Approaches [Startups]

Neil Rubens

Solving the AL Chicken-and-Egg Corpus and Model Problem

Neil Rubens

paper: http://www.lrec-conf.org/proceedings/lrec2016/pdf/28_Paper.pdf tool: https://github.com/move-tool/gephi-plugins Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts, but has the limitation that it is carried out in a closed-loop manner, selecting points that will improve an existing model. When there is no model, or the task(s) is even under-defined (such as studying corpora-less phenomena), use of traditional AL is inapplicable. To remedy this, we propose a novel method for model-free AL that focuses on utilising phenomena as desirable characteristics. We introduce a tool, MOVE, that helps iteratively visualise and refine these characteristics. We show its potential on a real world case-study of a corpus we are developing.

Recommender Systems and Active Learning (for Startups)

Neil Rubens

ThingTank @ MIT-Skoltech Innovation Symposium 2014Neil Rubens

Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Neil Rubens

e-learning 3.0 and AINeil Rubens

Learning Networks: e-Learning 3.0

Neil Rubens

Active Learning in Recommender Systems

Neil Rubens

Outliers and Inconsistency

Neil Rubens

Alumni Network AnalysisNeil Rubens

Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...

Neil Rubens

Value Co-Creation in Innovation Ecosystems (English)

Neil Rubens

Value Co-Creation in Innovation Ecosystems (Chinese)

Neil Rubens

Japan Mobile

Neil Rubens

More from Neil Rubens (16)

Autism: Survey of Emerging Approaches [Clinical]

Collaborative Robotics (CoBot): Opportunities for Corporations

Autism: Survey of Emerging Approaches [Startups]

Solving the AL Chicken-and-Egg Corpus and Model Problem

Recommender Systems and Active Learning (for Startups)

ThingTank @ MIT-Skoltech Innovation Symposium 2014

Network Learning: AI-driven Connectivist Framework for E-Learning 3.0

e-learning 3.0 and AI

Learning Networks: e-Learning 3.0

Active Learning in Recommender Systems

Outliers and Inconsistency

Alumni Network Analysis

Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...

Value Co-Creation in Innovation Ecosystems (English)

Value Co-Creation in Innovation Ecosystems (Chinese)

Japan Mobile

Recently uploaded

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

Vladimir Iglovikov, Ph.D.

Presented by Vladimir Iglovikov: - https://www.linkedin.com/in/iglovikov/ - https://x.com/viglovikov - https://www.instagram.com/ternaus/ This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation. Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners. This case study covers various aspects, including: People: The contributors and community that have supported Albumentations. Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions. Challenges: The hurdles in monetizing open-source projects and measuring user engagement. Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration. Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community. Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations. Mental Health: Maintaining balance and not feeling pressured by user demands. Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth. Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects. Explore more about Albumentations and join the community at: GitHub: https://github.com/albumentations-team/albumentations Website: https://albumentations.ai/ LinkedIn: https://www.linkedin.com/company/100504475 Twitter: https://x.com/albumentations

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

GridMate - End to end testing is a critical piece to ensure quality and avoid...

ThomasParaiso2

Artificial Intelligence for XMLDevelopment

Octavian Nadolu

In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject. We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup. Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved. The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring. The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise. By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Nexer Digital

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

nkrafacyberclub

UiPath Test Automation using UiPath Test Suite series, part 6

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI. UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities. Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes. What will you get from this session? 1. Insights into integrating generative AI. 2. Understanding how this integration enhances test automation within the UiPath platform 3. Practical demonstrations 4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath Topics covered: What is generative AI Test Automation with generative AI and Open AI. UiPath integration with generative AI Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

20240609 QFM020 Irresponsible AI Reading List May 2024

Matthew Sinclair

By Design, not by Accident - Agile Venture Bolzano 2024

Pierluigi Pugliese

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered: • Communication Mining Overview • Why is it important? • How can it help today’s business and the benefits • Phases in Communication Mining • Demo on Platform overview • Q/A

Mind map of terminologies used in context of Generative AI

Kumud Singh

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity

20240605 QFM017 Machine Intelligence Reading List May 2024

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

GridMate - End to end testing is a critical piece to ensure quality and avoid...

Artificial Intelligence for XMLDevelopment

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

UiPath Test Automation using UiPath Test Suite series, part 6

Video Streaming: Then, Now, and in the Future

20240609 QFM020 Irresponsible AI Reading List May 2024

By Design, not by Accident - Agile Venture Bolzano 2024

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

Uni Systems Copilot event_05062024_C.Vlachos.pdf

National Security Agency - NSA mobile device best practices

Communications Mining Series - Zero to Hero - Session 1

Mind map of terminologies used in context of Generative AI

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Inconsistent Outliers

1. Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan

2. Outline Inconsistency Robustness is a multi-disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency

3. Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.

4. Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data

5. Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG

6. Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model

7. Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus

10.

11. Rubens et al, AJS 2011

12.

13. If there is no inconsistency between the training and testing data then the most complex model would tend be selected.

14. Change Detection / Model Correction Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg

15. Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

Editor's Notes

Hello. First of all, I would like to apologize for not being here in person; but I hope to join discussions about Inconsistency Robustness through online means.In my presentation I would like to talk about relations between Inconsistency and Outliers.
As could be seen from the symposium’s program the issue of Inconsistency Robustness is rather multi-disciplinary. Let me discuss some of its aspects from the Machine Learning perspective. More specifically I would like to express my views about what is inconsistency, whether it could be useful and how it could be measured.
In Machine Learning we typically refer to inconsistent points as outliers. Typically, we try to construct a model that is able to fits well the data that we have. The points that do not fit the model are typically considered to be an outlier.I think this cartoon captures very well the essence of the outliers. The outlier piont says that our model/or theory is not correct. On the other hand we consider outliers to be some erroneous or atypical data and tend to discard it.
We can separate outlier into two classes.In the case of Spatial Outlier, the point is considered to be an outlier if it is distant from other points.In the case of Model Outlier, an outlier is a point whose label is different from the model’s expectations.In this talk we will focus on the model outliers.
Outliers can occur due to a variety of causesOutlier could be a Faulty Data caused by the data entry error, or a measurement malfunctionThen there are outliers that occur by chance due to some natural deviationFinally outliers may be due to the incorrect assumptions that we make about the underlying model
When encountering an outlierit is often assumed that current hypothesis/model is reasonably accurate for most of the points, and is inaccurate for just a few outliers. Therefore using outliers is considered to lead the learning process astray towards tuning the model for some incorrect or uncommon cases and therefore making it less accurate for the majority of the points. So outliers are typically discardedWe often get attached to our models/theories and tend to downplay or disregard data does not agree with it.
But we must also consider the other possibility; That the data is right; and the model is wrong In which case the model needs to be changed and corrected
Let us discuss setting in which outlier points could be very useful for learning.Consider that we have many points and we want to learn which points are orange and which points are blue. This could be problem of predicting which movie you like, whether webpage is relevant to your query, which treatment should be prescribed, etc. Typical approach is simply to get a lot of data and then to learn from it. However in many settings obtaining data could be costly e.g. if we want to discover effective treatment of adisease we may have to try out many compounds and that costs a lot of money and effort. If I want to learn about your preferences for movies, I would I need to ask you which movies you like and which ones you don’t; but that takes time and effort and many people are able to provide only a few ratings.So since data is costly we want to obtain data that is most informative and useful.
So to learn the underlying colorings we can obtain a few samples, that is we select the points that we are interested in and their color is revealed.Lets say we have obtained a couple of points already. There could be a number of hypothesis/decision (shown by dashed lines) that are consistent with these points; i.e. points on one side of the line are blue and on the other side are orange. Then when predicting the color of the points we have to select one of the hypotheses and to hope that it is the correct one.
Lets consider that we are now allowed to get another sample. We can choose a sample that is consistent with all of the hypothesis; i.e. all of the hypothesis assign the same color to it. Not surprisingly when the color of the point is revealed it is blue. This might seem like a good thing, but unfortunately it does not allow to reduce the number of hypothesis so that we can find the correct one. On the other hand we can choose an inconsistent point for which part of the hypothesis assign blue color and the other one orange. After the color of the point is revealed we can get rid of the hypothesis that got it wrong; and get closer to finding the right hypothesis.
I would like to make another argument in support of outliers being informative.There is a very interesting phrase by Gregory Bateson that defines information as a difference that makes a difference. Outliers fit the viewpoint of information very well.Outliers are different from the rest of the points by definition.And including outliers in the learning process will make a difference on the model’s predictionsThe intuition behind this principle is thatThe only way that model’s prediction will improve, is if they will change.However, not all of changes are good; so the tricky part is to determine when the change is for the better and when it is not.
Let me briefly mention relation between inconsistency and model complexity.As the number of training point increases more complex models tend to fit data better. e.g. When we have just two points linear model fits the data very well; if we add another point a linear model may no longer be complex enough to fit the data, so we may need to use a polynomial model of order 2; and then as we add more points increasing complex models may be neededAn important implication of that being that as we learn more and more the underlying model is likely to change and to become increasingly complex.
The problem with simply increasing the model’s complexity is that the model that is too complex may start overfitting to the data, e.g. learning noise and not the signal. So allowing for some inconsistency could be good; models that do exceptionally well on some data may actually start to memorize it instead of learning it.So having some inconsistency between training and testing data could actually prevent us from making model more complex than necessary.
The initial learned model could be accurate; but as the time progressed the underlying process may have started to change; e.g. we saw some drastic changes in the stock pricing models these past two weeks. So when we encounter inconsistent data we should not discard it as noise, but try to see if it could be indicative of our current model being incorrect and if possible try to correct it.
In Conclusion, I hope that I was able to show that sometimes inconsistency could actually be rather useful for such things asHypothesis Learning, Model Selection and Model Correction.Thank You.

Inconsistent Outliers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Inconsistent Outliers

Similar to Inconsistent Outliers (20)

More from Neil Rubens

More from Neil Rubens (16)

Recently uploaded

Recently uploaded (20)

Inconsistent Outliers

Editor's Notes