SlideShare a Scribd company logo
Multivariate Data 
Analysis 
SETIA PRAMANA
Course Outline 
Introduction 
◦ Overview of Multivariate data analysis 
◦ The applications 
Matrix Algebra And Random Vectors 
Sample Geometry 
Multivariate Normal Distribution 
Inference About A Mean Vector 
Comparison Several Mean Vectors 
Setia Pramana SURVIVAL DATA ANALYSIS 2
Course Outline 
Principal Component Analysis 
Factor Analysis 
Cluster Analysis 
Discriminant Analysis 
Canonical Correlations 
Setia Pramana SURVIVAL DATA ANALYSIS 3
Course Workload 
40% Theory, 60% practice 
Group Project (4 students) 
Group Presentation in ENGLISH every week 
Software used is mainly R, others are allowed 
R code would be provided 
Slides can be seen at : http://www.slideshare.net/hafidztio/ 
Setia Pramana SURVIVAL DATA ANALYSIS 4
Reference Books 
Setia Pramana SURVIVAL DATA ANALYSIS 5
Intermezzo 
http://www.youtube.com/watch?v=zRsMEl6PHhM&list=PLFE776F2C513A744E 
http://tylervigen.com/
Data Types
Type of Analysis
Type of Analysis
What is Multivariate? 
 Univariate Analysis? 
 Some describe it as: any statistical technique used to analyze data 
that arises from more than one variable 
 Multivariable vs. Multivariate Analysis 
 http://www.youtube.com/watch?v=KhA_PCMPZZo
Example of MV Data
Other Examples?
What is Multivariate Data Analysis? 
 The statistical analysis of the data collected on more than one 
(response) variable. 
 We want to analyze them simultaneously 
 The variables may be correlated with each other 
 The dependence is taken into account 
 More complex univariate analysis 
 In the real world, most data are multivariate data 
 Basic Statistical Analysis for Data Mining
Types of MVA 
 Exploratory Data Analysis (EDA): Sometimes called data mining this area is useful for gaining 
deeper insights into large, complex data sets. 
Regression analysis: Develops models to predict new and future events. Is useful for predictive 
analytics applications. 
Classification for identifying new or existing classes: This area is useful in research, 
development, market analysis, etc.
MVD objectives 
1. Data reduction or structural simplification. To simplify without 
loosing any valuable information and make interpretation easier. 
2. Sorting and grouping. Similar objects or variables are grouped, 
based upon the characteristics. Define rules for classifying objects 
into well-defined groups. 
3. Investigation of the dependence among variables. The nature of 
the relationships among variables is of interest. Are all the 
variables mutually dependent/ independent?
MVD objectives 
4. Prediction. Relationships between variables must be 
determined for the purpose of predicting the values of one or 
more variables on the basis of observations on the other 
variables. 
5. Hypothesis construction and testing. Specific statistical 
hypotheses, formulated are tested.
Examples of Multivariate Data 
http://www.youtube.com/watch?v=eEpxN0htRKI
Software 
1. SAS 
2. R 
3. SPSS 
4. Herodes 
5. etc….
Applications 
 Petrochemical and refining operations, including early fault detection and 
gasoline blending and optimisation 
 Food and beverage applications, particularly for consumer segmentation and 
new product development 
 Agricultural analysis, including real-time analysis of protein and moisture in 
wheat, barley and other crops 
 Business Intelligence and marketing for predicting changes in dynamic markets 
or better product placement 
 Oil and gas and mining, including analysis of machinery performance and 
locating new sources of commodities
Applications 
Data reduction or simplification 
Using data on several variables related to cancer patient responses to 
radiotherapy, a simple measure of patient response to radiotherapy was 
constructed. 
Multispectral image data collected by a high-altitude scanner were reduced to a 
form that could be viewed as images (pictures) of a shoreline in two dimensions. 
Data on several variables relating to yield and protein content were used to 
create an index to select parents of subsequent generations of improved bean 
plants.
Applications 
Sorting and grouping 
• Data on several variables related to computer use were employed to create 
clusters of categories of computer jobs that allow a better determination of 
existing (or planned) computer utilization. 
• Measurements of several physiological variables were used to develop a 
screening procedure that discriminates alcoholics from nonalcoholics. 
• Data related to responses to visual stimuli were used to develop a rule for 
separating people suffering from a multiple-sclerosis-caused visual pathology 
from those not suffering from the disease.
Applications 
Investigation of the dependence among variables 
• Data on several variables were used to identify factors that were responsible 
for client success in hiring external consultants. 
• Measurements of variables related to innovation, and variables related to the 
business environment and business organization, on the other hand, were used 
to discover why some firms are product innovators and some firms are not. 
• Measurements of pulp fiber characteristics and subsequent measurements of 
characteristics of the paper made from them are used to examine the relations 
between pulp fiber properties and the resulting paper properties. The goal is to 
determine those fibers that lead to higher quality paper.
Applications 
Prediction 
• The associations between test scores, and several high school performance variables, 
and several college performance variables were used to develop predictors of success in 
college. 
• Data on several variables related to the size distribution of sediments were used to 
develop rules for predicting different depositional environments. 
• Measurements on several accounting and financial variables were used to develop a 
method for identifying potentially insolvent property-liability insurers. 
• cDNA microarray experiments (gene expression data) are increasingly used to study 
the molecular variations among cancer tumors. A reliable classification of tumors is 
essential for successful diagnosis and treatment of cancer.
Applications 
Hypotheses testing 
• Several pollution-related variables were measured to determine whether 
levels for a large metropolitan area were roughly constant throughout the week, 
or whether there was a noticeable difference between weekdays and weekends. 
• Experimental data on several variables were used to see whether the nature of 
the instructions makes any difference in perceived risks, as quantified by test 
scores. 
• Data on many variables were used to investigate the differences in structure of 
American occupations to determine the support for one of two competing 
sociological theories.
Other Applications? 
In Group, discuss multivariate data on: 
1. Biomedical 
2. Economic 
3. Government Policy 
4. Health 
5. Social 
6. Demography 
7. Business 
8. Telecommunication 
9. Education 
10. Psychology
Data Structure
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
Visualization: Two-Dim Scatter Plots
Visualization: Two-Dim Scatter Plots
Visualization: Growth Curves
Visualization: Growth Curves
Visualization: Stars
Visualization: Stars
Visualization: Chernoff Faces
Chernoff Faces
Visualizations
Other Visualizations
Other Visualizations
Other Visualizations
Distance
Distance
Next Week: Matrix Algebra
Multivariate data analysis

More Related Content

What's hot

Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
Dr NEETHU ASOKAN
 
Multivariate analyses
Multivariate analysesMultivariate analyses
Multivariate analysesNaveen Deswal
 
Sampling design ppt
Sampling design pptSampling design ppt
Sampling design ppt
Shilpi Panchal
 
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)
Avjinder (Avi) Kaler
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
Nima
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in ResearchQasim Raza
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate AnalysisSoumya Sahoo
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
MOHIT PANCHAL
 
Research Design
Research Design Research Design
Research Design
Chetan Pawar 2829
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
saba khan
 
Research methodology - Analysis of Data
Research methodology - Analysis of DataResearch methodology - Analysis of Data
Research methodology - Analysis of Data
The Stockker
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Kaori Kubo Germano, PhD
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesisJags Jagdish
 
01 parametric and non parametric statistics
01 parametric and non parametric statistics01 parametric and non parametric statistics
01 parametric and non parametric statistics
Vasant Kothari
 
Data analysis
Data analysisData analysis
Data analysis
Aleeza Ahmad
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
Sabbir Tahmidur Rahman
 

What's hot (20)

Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
 
Multivariate analyses
Multivariate analysesMultivariate analyses
Multivariate analyses
 
Sampling design ppt
Sampling design pptSampling design ppt
Sampling design ppt
 
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in Research
 
Crosstabs
CrosstabsCrosstabs
Crosstabs
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Research Design
Research Design Research Design
Research Design
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Research methodology - Analysis of Data
Research methodology - Analysis of DataResearch methodology - Analysis of Data
Research methodology - Analysis of Data
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
F test
F testF test
F test
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
01 parametric and non parametric statistics
01 parametric and non parametric statistics01 parametric and non parametric statistics
01 parametric and non parametric statistics
 
Data analysis
Data analysisData analysis
Data analysis
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 

Similar to Multivariate data analysis

Operational research
Operational researchOperational research
Operational research
Dr Ramniwas
 
FutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptxFutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptx
PriyanshuYadav365563
 
Conduct title screening for systemic review- using Endnote Covidence – Pubric...
Conduct title screening for systemic review- using Endnote Covidence – Pubric...Conduct title screening for systemic review- using Endnote Covidence – Pubric...
Conduct title screening for systemic review- using Endnote Covidence – Pubric...
Pubrica
 
Systematic Review Workflows and Semantic Solutions for Integrating Biological...
Systematic Review Workflows and Semantic Solutions for Integrating Biological...Systematic Review Workflows and Semantic Solutions for Integrating Biological...
Systematic Review Workflows and Semantic Solutions for Integrating Biological...
Michelle Angrish
 
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
rahulmonikasharma
 
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...
OECD Environment
 
Enhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication NetworksEnhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication Networks
Editor IJCATR
 
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
Editor IJCATR
 
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Editor IJCATR
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
Susanna-Assunta Sansone
 
· define the terms sample and population and describe some of
· define the terms sample and population and describe some of · define the terms sample and population and describe some of
· define the terms sample and population and describe some of
LesleyWhitesidefv
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
lurdhu agnes
 
Ten basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptxTen basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptx
Pubrica
 
Overview of ePRO
Overview of ePROOverview of ePRO
Overview of ePRO
challPHT
 
Multivariate Approaches in Nursing Research Assignment.pdf
Multivariate Approaches in Nursing Research Assignment.pdfMultivariate Approaches in Nursing Research Assignment.pdf
Multivariate Approaches in Nursing Research Assignment.pdf
bkbk37
 
Datascience
DatascienceDatascience
Datascience
JayaKulshrestha
 
datascience.docx
datascience.docxdatascience.docx
datascience.docx
JayaKulshrestha
 
NISO Altmetrics Initiative, ALA Update January 21, 2017
NISO Altmetrics Initiative, ALA Update January 21, 2017NISO Altmetrics Initiative, ALA Update January 21, 2017
NISO Altmetrics Initiative, ALA Update January 21, 2017
National Information Standards Organization (NISO)
 
Practical Research 2 - Week 1
Practical Research 2 - Week 1Practical Research 2 - Week 1
Practical Research 2 - Week 1
Regenita
 

Similar to Multivariate data analysis (20)

Operational research
Operational researchOperational research
Operational research
 
FutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptxFutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptx
 
Conduct title screening for systemic review- using Endnote Covidence – Pubric...
Conduct title screening for systemic review- using Endnote Covidence – Pubric...Conduct title screening for systemic review- using Endnote Covidence – Pubric...
Conduct title screening for systemic review- using Endnote Covidence – Pubric...
 
Systematic Review Workflows and Semantic Solutions for Integrating Biological...
Systematic Review Workflows and Semantic Solutions for Integrating Biological...Systematic Review Workflows and Semantic Solutions for Integrating Biological...
Systematic Review Workflows and Semantic Solutions for Integrating Biological...
 
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
 
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...
 
Enhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication NetworksEnhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication Networks
 
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
 
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
 
2016 LabHIT Vision
2016 LabHIT Vision2016 LabHIT Vision
2016 LabHIT Vision
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
· define the terms sample and population and describe some of
· define the terms sample and population and describe some of · define the terms sample and population and describe some of
· define the terms sample and population and describe some of
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
 
Ten basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptxTen basic guidelines for conducting and publishing a meta-analysis.pptx
Ten basic guidelines for conducting and publishing a meta-analysis.pptx
 
Overview of ePRO
Overview of ePROOverview of ePRO
Overview of ePRO
 
Multivariate Approaches in Nursing Research Assignment.pdf
Multivariate Approaches in Nursing Research Assignment.pdfMultivariate Approaches in Nursing Research Assignment.pdf
Multivariate Approaches in Nursing Research Assignment.pdf
 
Datascience
DatascienceDatascience
Datascience
 
datascience.docx
datascience.docxdatascience.docx
datascience.docx
 
NISO Altmetrics Initiative, ALA Update January 21, 2017
NISO Altmetrics Initiative, ALA Update January 21, 2017NISO Altmetrics Initiative, ALA Update January 21, 2017
NISO Altmetrics Initiative, ALA Update January 21, 2017
 
Practical Research 2 - Week 1
Practical Research 2 - Week 1Practical Research 2 - Week 1
Practical Research 2 - Week 1
 

More from Setia Pramana

Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016
Setia Pramana
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
Setia Pramana
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
Setia Pramana
 
Bioinformatics I-4 lecture
Bioinformatics I-4 lectureBioinformatics I-4 lecture
Bioinformatics I-4 lecture
Setia Pramana
 
Correlation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelCorrelation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft Excel
Setia Pramana
 
Pengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di EropaPengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di Eropa
Setia Pramana
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Setia Pramana
 
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
Setia Pramana
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
Setia Pramana
 
Research Methods for Computational Statistics
Research Methods for Computational StatisticsResearch Methods for Computational Statistics
Research Methods for Computational Statistics
Setia Pramana
 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSurvival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Setia Pramana
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
Setia Pramana
 
“Big Data” and the Challenges for Statisticians
“Big Data” and the  Challenges for Statisticians“Big Data” and the  Challenges for Statisticians
“Big Data” and the Challenges for Statisticians
Setia Pramana
 
Getting a Scholarship, how?
Getting a Scholarship, how?Getting a Scholarship, how?
Getting a Scholarship, how?
Setia Pramana
 
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity NumberKehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Setia Pramana
 
Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...
Setia Pramana
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User Interfaces
Setia Pramana
 
Academia vs industry
Academia vs industryAcademia vs industry
Academia vs industry
Setia Pramana
 
Gene sebuah nikmat Allah
Gene sebuah nikmat AllahGene sebuah nikmat Allah
Gene sebuah nikmat AllahSetia Pramana
 

More from Setia Pramana (20)

Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
Bioinformatics I-4 lecture
Bioinformatics I-4 lectureBioinformatics I-4 lecture
Bioinformatics I-4 lecture
 
Correlation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelCorrelation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft Excel
 
Pengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di EropaPengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di Eropa
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
 
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
Research Methods for Computational Statistics
Research Methods for Computational StatisticsResearch Methods for Computational Statistics
Research Methods for Computational Statistics
 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSurvival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
 
“Big Data” and the Challenges for Statisticians
“Big Data” and the  Challenges for Statisticians“Big Data” and the  Challenges for Statisticians
“Big Data” and the Challenges for Statisticians
 
Getting a Scholarship, how?
Getting a Scholarship, how?Getting a Scholarship, how?
Getting a Scholarship, how?
 
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity NumberKehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
 
Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User Interfaces
 
Academia vs industry
Academia vs industryAcademia vs industry
Academia vs industry
 
Gene sebuah nikmat Allah
Gene sebuah nikmat AllahGene sebuah nikmat Allah
Gene sebuah nikmat Allah
 

Recently uploaded

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 

Recently uploaded (20)

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 

Multivariate data analysis

  • 2. Course Outline Introduction ◦ Overview of Multivariate data analysis ◦ The applications Matrix Algebra And Random Vectors Sample Geometry Multivariate Normal Distribution Inference About A Mean Vector Comparison Several Mean Vectors Setia Pramana SURVIVAL DATA ANALYSIS 2
  • 3. Course Outline Principal Component Analysis Factor Analysis Cluster Analysis Discriminant Analysis Canonical Correlations Setia Pramana SURVIVAL DATA ANALYSIS 3
  • 4. Course Workload 40% Theory, 60% practice Group Project (4 students) Group Presentation in ENGLISH every week Software used is mainly R, others are allowed R code would be provided Slides can be seen at : http://www.slideshare.net/hafidztio/ Setia Pramana SURVIVAL DATA ANALYSIS 4
  • 5. Reference Books Setia Pramana SURVIVAL DATA ANALYSIS 5
  • 10. What is Multivariate?  Univariate Analysis?  Some describe it as: any statistical technique used to analyze data that arises from more than one variable  Multivariable vs. Multivariate Analysis  http://www.youtube.com/watch?v=KhA_PCMPZZo
  • 13. What is Multivariate Data Analysis?  The statistical analysis of the data collected on more than one (response) variable.  We want to analyze them simultaneously  The variables may be correlated with each other  The dependence is taken into account  More complex univariate analysis  In the real world, most data are multivariate data  Basic Statistical Analysis for Data Mining
  • 14. Types of MVA  Exploratory Data Analysis (EDA): Sometimes called data mining this area is useful for gaining deeper insights into large, complex data sets. Regression analysis: Develops models to predict new and future events. Is useful for predictive analytics applications. Classification for identifying new or existing classes: This area is useful in research, development, market analysis, etc.
  • 15. MVD objectives 1. Data reduction or structural simplification. To simplify without loosing any valuable information and make interpretation easier. 2. Sorting and grouping. Similar objects or variables are grouped, based upon the characteristics. Define rules for classifying objects into well-defined groups. 3. Investigation of the dependence among variables. The nature of the relationships among variables is of interest. Are all the variables mutually dependent/ independent?
  • 16. MVD objectives 4. Prediction. Relationships between variables must be determined for the purpose of predicting the values of one or more variables on the basis of observations on the other variables. 5. Hypothesis construction and testing. Specific statistical hypotheses, formulated are tested.
  • 17. Examples of Multivariate Data http://www.youtube.com/watch?v=eEpxN0htRKI
  • 18. Software 1. SAS 2. R 3. SPSS 4. Herodes 5. etc….
  • 19. Applications  Petrochemical and refining operations, including early fault detection and gasoline blending and optimisation  Food and beverage applications, particularly for consumer segmentation and new product development  Agricultural analysis, including real-time analysis of protein and moisture in wheat, barley and other crops  Business Intelligence and marketing for predicting changes in dynamic markets or better product placement  Oil and gas and mining, including analysis of machinery performance and locating new sources of commodities
  • 20. Applications Data reduction or simplification Using data on several variables related to cancer patient responses to radiotherapy, a simple measure of patient response to radiotherapy was constructed. Multispectral image data collected by a high-altitude scanner were reduced to a form that could be viewed as images (pictures) of a shoreline in two dimensions. Data on several variables relating to yield and protein content were used to create an index to select parents of subsequent generations of improved bean plants.
  • 21. Applications Sorting and grouping • Data on several variables related to computer use were employed to create clusters of categories of computer jobs that allow a better determination of existing (or planned) computer utilization. • Measurements of several physiological variables were used to develop a screening procedure that discriminates alcoholics from nonalcoholics. • Data related to responses to visual stimuli were used to develop a rule for separating people suffering from a multiple-sclerosis-caused visual pathology from those not suffering from the disease.
  • 22. Applications Investigation of the dependence among variables • Data on several variables were used to identify factors that were responsible for client success in hiring external consultants. • Measurements of variables related to innovation, and variables related to the business environment and business organization, on the other hand, were used to discover why some firms are product innovators and some firms are not. • Measurements of pulp fiber characteristics and subsequent measurements of characteristics of the paper made from them are used to examine the relations between pulp fiber properties and the resulting paper properties. The goal is to determine those fibers that lead to higher quality paper.
  • 23. Applications Prediction • The associations between test scores, and several high school performance variables, and several college performance variables were used to develop predictors of success in college. • Data on several variables related to the size distribution of sediments were used to develop rules for predicting different depositional environments. • Measurements on several accounting and financial variables were used to develop a method for identifying potentially insolvent property-liability insurers. • cDNA microarray experiments (gene expression data) are increasingly used to study the molecular variations among cancer tumors. A reliable classification of tumors is essential for successful diagnosis and treatment of cancer.
  • 24. Applications Hypotheses testing • Several pollution-related variables were measured to determine whether levels for a large metropolitan area were roughly constant throughout the week, or whether there was a noticeable difference between weekdays and weekends. • Experimental data on several variables were used to see whether the nature of the instructions makes any difference in perceived risks, as quantified by test scores. • Data on many variables were used to investigate the differences in structure of American occupations to determine the support for one of two competing sociological theories.
  • 25. Other Applications? In Group, discuss multivariate data on: 1. Biomedical 2. Economic 3. Government Policy 4. Health 5. Social 6. Demography 7. Business 8. Telecommunication 9. Education 10. Psychology
  • 45. Next Week: Matrix Algebra