SlideShare a Scribd company logo
1 of 15
Download to read offline
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
EXPLORING VARIABLE CLUSTERING
AND IMPORTANCE IN JMP
CHRIS GOTWALT AND RYAN PARKER
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
INTRODUCTION
• Variable clustering is a method that performs dimension reduction on the
number of input variables to be used in a predictive model.
• Reduces inputs by finding groups of similar variables so that a single variable
can represent each group.
• Helps reduce effects of collinearity on the input variables.
• Developed by SAS/STAT Development Director Warren Sarle.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
AN ITERATIVE ALGORITHM
• Iteratively splits and assigns variables to clusters.
• Sample iterations for variables in Wine Quality data set:
Iteration 1 Alcohol, Citric Acid, pH, Sugar, Sulfur Dioxide
Alcohol, Citric Acid, Sulfur Dioxide
Alcohol, Sugar
pH, Sulfur
Dioxide
pH, Sugar
Citric Acid
Iteration 2
Iteration 3
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
ALGORITHM DETAILS
• At each iteration the cluster with the largest second eigenvalue is split.
• Variables within this cluster are assigned to two new clusters based on each
variable’s correlation with the first two orthoblique rotated principal
components.
• After the split, variables from other clusters are reassigned to one of the new
clusters if they have a higher correlation with the new cluster.
• Ends when the second eigenvalue of all clusters is less than one.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
REDUCING EACH CLUSTER TO A SINGLE VARIABLE
pH
Sugar
pH
Citric
Acid
• Each cluster can be reduced to a single
variable for modeling.
• There are two ways to do this:
1. We can use the most representative
variable from each cluster.
2. Alternatively, the cluster component from
each cluster can be used.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
MOST REPRESENTATIVE VARIABLES
• These are variables that best represent each cluster.
• They have the highest correlation with the variables in its cluster.
• Most representative variables provide a clear interpretation when used.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
CLUSTER COMPONENTS
• New variables created using the first principal component of each cluster.
• Provide a way to combine variables in each cluster into a single variable.
• Similar to traditional principal components analysis (PCA) except that each
cluster component only uses variables from that cluster.
• Interpretation not as clear when compared to most representative variables.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
CLUSTERING
DEMO: IMPORTANT TERMS
• RSquare with Own Cluster
• The RSquare a variable has with variables in its cluster.
• RSquare with Next Closest
• The RSquare a variable has with variables in the next most similar cluster.
• 1-RSquare Ratio
• Relative similarity between a variable’s own cluster and the next closest cluster.
• Values should always be less than 1.
• Values greater than 1 indicate variable should be moved to the next closest cluster.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
INTRODUCTION
• Provides a general way to assess the importance of variables for predictive
models in JMP.
• Insight is in terms of practical significance of input variables.
• Based on functional decomposition ideas of I. M. Sobol.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
FUNCTIONAL DECOMPOSITION
• I. M. Sobol showed that we can decompose a function 𝑓(𝑋1, … , 𝑋 𝑝) into the
sum of lower dimensional inputs:
• 𝑓 𝑋1, … , 𝑋 𝑝 = 𝑓0 + 𝑓1 𝑋1 + ⋯ + 𝑓𝑝 𝑋 𝑝 + 𝑓12 𝑋1, 𝑋2 + ⋯
• Decomposition has a function for each 𝑋𝑖, each pair (𝑋𝑖, 𝑋𝑗), etc.
• The variability of these lower dimensional functions assess the importance of
the input variables.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
IMPORTANCE EFFECTS
• Assessment of variable importance is in terms of effect indices.
• These indices are numbers between 0 and 1 indicating relative importance.
• Main effect indices measure variability of predictions due to a single input.
• They do not account for interaction effects.
• Total effect indices measure the total variability of predictions due the input.
• Combines all main and higher order interaction effects.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
DISTRIBUTION OF INPUT VARIABLES
• Variability in predictions is due to the distribution of input variables
• JMP 11 provides three input variable distribution options:
1. Independent Uniform
2. Independent Resampled
3. Dependent Resampled
• Monte Carlo estimation procedure used for independent cases.
• 𝐾-nearest neighbors estimation used for dependent case.
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
USE RESAMPLED INPUTS?
Uniform
Acceptable
Resampled
Needed
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
MARGINAL INFERENCE
Main Effects0.16 0.03
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
VARIABLE
IMPORTANCE
DEMO

More Related Content

Viewers also liked

Advanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProAdvanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProJMP software from SAS
 
Vicios del lenguaje
Vicios del lenguajeVicios del lenguaje
Vicios del lenguajeblft123
 
Tips mengadakan majlis perkahwinan ros
Tips mengadakan majlis perkahwinan rosTips mengadakan majlis perkahwinan ros
Tips mengadakan majlis perkahwinan rosRose Katering
 
впн в россии
впн в россиивпн в россии
впн в россии19nature
 
Webquest on output_devices[1]
Webquest on output_devices[1]Webquest on output_devices[1]
Webquest on output_devices[1]edtechfacey
 
Perk acties a6
Perk acties a6Perk acties a6
Perk acties a6Jaap Kemp
 
Photobooooooooth
PhotoboooooooothPhotobooooooooth
Photoboooooooothnadim1020
 
Jeopardy (output devices)
Jeopardy (output devices)Jeopardy (output devices)
Jeopardy (output devices)edtechfacey
 
Washington presentation 3.1
Washington presentation 3.1Washington presentation 3.1
Washington presentation 3.1jbuyonje
 
Localization with Mozilla
Localization with MozillaLocalization with Mozilla
Localization with MozillaRaiyad Raad
 
Angloingles
AngloinglesAngloingles
Angloinglesblft123
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsJMP software from SAS
 
Washington, d.c. presentation
Washington, d.c. presentationWashington, d.c. presentation
Washington, d.c. presentationjbuyonje
 
Correcting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignCorrecting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignJMP software from SAS
 
Random Quiz Maker in C Language Project Slide
Random Quiz Maker in C Language Project SlideRandom Quiz Maker in C Language Project Slide
Random Quiz Maker in C Language Project SlideRaiyad Raad
 
Lighting the-way: ESAB hybrid-laser-welding
Lighting the-way: ESAB hybrid-laser-weldingLighting the-way: ESAB hybrid-laser-welding
Lighting the-way: ESAB hybrid-laser-weldingJaap Kemp
 

Viewers also liked (16)

Advanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProAdvanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP Pro
 
Vicios del lenguaje
Vicios del lenguajeVicios del lenguaje
Vicios del lenguaje
 
Tips mengadakan majlis perkahwinan ros
Tips mengadakan majlis perkahwinan rosTips mengadakan majlis perkahwinan ros
Tips mengadakan majlis perkahwinan ros
 
впн в россии
впн в россиивпн в россии
впн в россии
 
Webquest on output_devices[1]
Webquest on output_devices[1]Webquest on output_devices[1]
Webquest on output_devices[1]
 
Perk acties a6
Perk acties a6Perk acties a6
Perk acties a6
 
Photobooooooooth
PhotoboooooooothPhotobooooooooth
Photobooooooooth
 
Jeopardy (output devices)
Jeopardy (output devices)Jeopardy (output devices)
Jeopardy (output devices)
 
Washington presentation 3.1
Washington presentation 3.1Washington presentation 3.1
Washington presentation 3.1
 
Localization with Mozilla
Localization with MozillaLocalization with Mozilla
Localization with Mozilla
 
Angloingles
AngloinglesAngloingles
Angloingles
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of Experiments
 
Washington, d.c. presentation
Washington, d.c. presentationWashington, d.c. presentation
Washington, d.c. presentation
 
Correcting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignCorrecting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal Design
 
Random Quiz Maker in C Language Project Slide
Random Quiz Maker in C Language Project SlideRandom Quiz Maker in C Language Project Slide
Random Quiz Maker in C Language Project Slide
 
Lighting the-way: ESAB hybrid-laser-welding
Lighting the-way: ESAB hybrid-laser-weldingLighting the-way: ESAB hybrid-laser-welding
Lighting the-way: ESAB hybrid-laser-welding
 

Similar to Exploring Variable Clustering and Importance in JMP

The Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingThe Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingJMP software from SAS
 
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and OptimizersBenjamin Reese
 
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimizationBenjamin Reese
 
Design of experiments
Design of experimentsDesign of experiments
Design of experimentsUpendra K
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptxMoinPasha12
 
need to realize in r studio (regression).pptx
need to realize in r studio (regression).pptxneed to realize in r studio (regression).pptx
need to realize in r studio (regression).pptxSmarajitPaulChoudhur
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformJMP software from SAS
 
Computer programming - variables constants operators expressions and statements
Computer programming - variables constants operators expressions and statementsComputer programming - variables constants operators expressions and statements
Computer programming - variables constants operators expressions and statementsJohn Paul Espino
 
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation TestingWodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation TestingPablo Gómez Abajo
 
The Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of ExperimentsThe Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of ExperimentsJMP software from SAS
 
166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...ESEM 2014
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)Ryan Herzog
 
Design of experiments formulation development exploring the best practices ...
Design of  experiments  formulation development exploring the best practices ...Design of  experiments  formulation development exploring the best practices ...
Design of experiments formulation development exploring the best practices ...Maher Al absi
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSOsama Yousaf
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)Ryan Herzog
 
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptEdu4Sure
 
Transaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptxTransaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptxRoshni814224
 

Similar to Exploring Variable Clustering and Importance in JMP (20)

The Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingThe Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for Resampling
 
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers
3DCS Advanced Tolerance Analysis - 5 Additional Analyzers and Optimizers
 
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization
3DCS Advanced Analyzers (AAO) for large assemblies and fast optimization
 
Design of experiments
Design of experimentsDesign of experiments
Design of experiments
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Guide to Java.pptx
Guide to Java.pptxGuide to Java.pptx
Guide to Java.pptx
 
need to realize in r studio (regression).pptx
need to realize in r studio (regression).pptxneed to realize in r studio (regression).pptx
need to realize in r studio (regression).pptx
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE Platform
 
Computer programming - variables constants operators expressions and statements
Computer programming - variables constants operators expressions and statementsComputer programming - variables constants operators expressions and statements
Computer programming - variables constants operators expressions and statements
 
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation TestingWodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
Wodel-Test: A Model-Based Framework for Language-Independent Mutation Testing
 
The Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of ExperimentsThe Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of Experiments
 
166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)
 
Design of experiments formulation development exploring the best practices ...
Design of  experiments  formulation development exploring the best practices ...Design of  experiments  formulation development exploring the best practices ...
Design of experiments formulation development exploring the best practices ...
 
1015 track2 abbott
1015 track2 abbott1015 track2 abbott
1015 track2 abbott
 
1030 track2 abbott
1030 track2 abbott1030 track2 abbott
1030 track2 abbott
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
 
Topic 5 (multiple regression)
Topic 5 (multiple regression)Topic 5 (multiple regression)
Topic 5 (multiple regression)
 
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.pptMarket Research using SPSS _ Edu4Sure Sept 2023.ppt
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
 
Transaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptxTransaction Management, Recovery and Query Processing.pptx
Transaction Management, Recovery and Query Processing.pptx
 

More from JMP software from SAS

Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...JMP software from SAS
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsJMP software from SAS
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive ModellingJMP software from SAS
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCJMP software from SAS
 
Everything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsEverything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsJMP software from SAS
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...JMP software from SAS
 
New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11JMP software from SAS
 

More from JMP software from SAS (12)

A Primer in Statistical Discovery
A Primer in Statistical DiscoveryA Primer in Statistical Discovery
A Primer in Statistical Discovery
 
Grafische Analyse Ihrer Excel Daten
Grafische Analyse  Ihrer Excel DatenGrafische Analyse  Ihrer Excel Daten
Grafische Analyse Ihrer Excel Daten
 
Building Better Models
Building Better ModelsBuilding Better Models
Building Better Models
 
JMP for Ethanol Producers
JMP for Ethanol ProducersJMP for Ethanol Producers
JMP for Ethanol Producers
 
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of Experiments
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive Modelling
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPC
 
Everything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsEverything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening Designs
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
 
New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Exploring Variable Clustering and Importance in JMP

  • 1. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. EXPLORING VARIABLE CLUSTERING AND IMPORTANCE IN JMP CHRIS GOTWALT AND RYAN PARKER
  • 2. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING INTRODUCTION • Variable clustering is a method that performs dimension reduction on the number of input variables to be used in a predictive model. • Reduces inputs by finding groups of similar variables so that a single variable can represent each group. • Helps reduce effects of collinearity on the input variables. • Developed by SAS/STAT Development Director Warren Sarle.
  • 3. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING AN ITERATIVE ALGORITHM • Iteratively splits and assigns variables to clusters. • Sample iterations for variables in Wine Quality data set: Iteration 1 Alcohol, Citric Acid, pH, Sugar, Sulfur Dioxide Alcohol, Citric Acid, Sulfur Dioxide Alcohol, Sugar pH, Sulfur Dioxide pH, Sugar Citric Acid Iteration 2 Iteration 3
  • 4. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING ALGORITHM DETAILS • At each iteration the cluster with the largest second eigenvalue is split. • Variables within this cluster are assigned to two new clusters based on each variable’s correlation with the first two orthoblique rotated principal components. • After the split, variables from other clusters are reassigned to one of the new clusters if they have a higher correlation with the new cluster. • Ends when the second eigenvalue of all clusters is less than one.
  • 5. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING REDUCING EACH CLUSTER TO A SINGLE VARIABLE pH Sugar pH Citric Acid • Each cluster can be reduced to a single variable for modeling. • There are two ways to do this: 1. We can use the most representative variable from each cluster. 2. Alternatively, the cluster component from each cluster can be used.
  • 6. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING MOST REPRESENTATIVE VARIABLES • These are variables that best represent each cluster. • They have the highest correlation with the variables in its cluster. • Most representative variables provide a clear interpretation when used.
  • 7. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING CLUSTER COMPONENTS • New variables created using the first principal component of each cluster. • Provide a way to combine variables in each cluster into a single variable. • Similar to traditional principal components analysis (PCA) except that each cluster component only uses variables from that cluster. • Interpretation not as clear when compared to most representative variables.
  • 8. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE CLUSTERING DEMO: IMPORTANT TERMS • RSquare with Own Cluster • The RSquare a variable has with variables in its cluster. • RSquare with Next Closest • The RSquare a variable has with variables in the next most similar cluster. • 1-RSquare Ratio • Relative similarity between a variable’s own cluster and the next closest cluster. • Values should always be less than 1. • Values greater than 1 indicate variable should be moved to the next closest cluster.
  • 9. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE INTRODUCTION • Provides a general way to assess the importance of variables for predictive models in JMP. • Insight is in terms of practical significance of input variables. • Based on functional decomposition ideas of I. M. Sobol.
  • 10. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE FUNCTIONAL DECOMPOSITION • I. M. Sobol showed that we can decompose a function 𝑓(𝑋1, … , 𝑋 𝑝) into the sum of lower dimensional inputs: • 𝑓 𝑋1, … , 𝑋 𝑝 = 𝑓0 + 𝑓1 𝑋1 + ⋯ + 𝑓𝑝 𝑋 𝑝 + 𝑓12 𝑋1, 𝑋2 + ⋯ • Decomposition has a function for each 𝑋𝑖, each pair (𝑋𝑖, 𝑋𝑗), etc. • The variability of these lower dimensional functions assess the importance of the input variables.
  • 11. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE IMPORTANCE EFFECTS • Assessment of variable importance is in terms of effect indices. • These indices are numbers between 0 and 1 indicating relative importance. • Main effect indices measure variability of predictions due to a single input. • They do not account for interaction effects. • Total effect indices measure the total variability of predictions due the input. • Combines all main and higher order interaction effects.
  • 12. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE DISTRIBUTION OF INPUT VARIABLES • Variability in predictions is due to the distribution of input variables • JMP 11 provides three input variable distribution options: 1. Independent Uniform 2. Independent Resampled 3. Dependent Resampled • Monte Carlo estimation procedure used for independent cases. • 𝐾-nearest neighbors estimation used for dependent case.
  • 13. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE USE RESAMPLED INPUTS? Uniform Acceptable Resampled Needed
  • 14. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE MARGINAL INFERENCE Main Effects0.16 0.03
  • 15. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. VARIABLE IMPORTANCE DEMO