This document outlines a course on multivariate data analysis. It introduces key topics that will be covered, including matrix algebra, the multivariate normal distribution, principal component analysis, factor analysis, cluster analysis, discriminant analysis, and canonical correlations. The course workload consists of 40% theory and 60% practice, including a group project and weekly presentations. R will be the main software used. Examples of multivariate data and applications in various fields like business, health, and education are also provided.
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
UNIVARIATE & BIVARIATE ANALYSIS
UNIVARIATE BIVARIATE & MULTIVARIATE
UNIVARIATE ANALYSIS
-One variable analysed at a time
BIVARIATE ANALYSIS
-Two variable analysed at a time
MULTIVARIATE ANALYSIS
-More than two variables analysed at a time
TYPES OF ANALYSIS
DESCRIPTIVE ANALYSIS
INFERENTIAL ANALYSIS
DESCRIPTIVE ANALYSIS
Transformation of raw data
Facilitate easy understanding and interpretation
Deals with summary measures relating to sample data
Eg-what is the average age of the sample?
INFERENTIAL ANALYSIS
Carried out after descriptive analysis
Inferences drawn on population parameters based on sample results
Generalizes results to the population based on sample results
Eg-is the average age of population different from 35?
DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA
1. Prepare frequency distribution of each variable
Missing Data
Situation where certain questions are left unanswered
Analysis of multiple responses
Measures of central tendency
3 measures of central tendency
1.Mean
2.Median
3.Mode
MEAN
Arithmetic average of a variable
Appropriate for interval and ratio scale data
x
MEDIAN
Calculates the middle value of the data
Computed for ratio, interval or ordinal scale.
Data needs to be arranged in ascending or descending order
MODE
Point of maximum frequency
Should not be computed for ordinal or interval data unless grouped.
Widely used in business
MEASURE OF DISPERSION
Measures of central tendency do not explain distribution of variables
4 measures of dispersion
1.Range
2.Variance and standard deviation
3.Coefficient of variation
4.Relative and absolute frequencies
DESCRIPTIVE ANALYSIS OF BIVARIATE DATA
There are three types of measure used.
1.Cross tabulation
2.Spearmans rank correlation coefficient
3.Pearsons linear correlation coefficient
Cross Tabulation
Responses of two questions are combined
Spearman’s rank order correlation coefficient.
Used in case of ordinal data
Correlation and regression.
It shows different aspects of Correlation and regression.
A small comparison of these two is also listed in this presentation.
Research methodology - Analysis of DataThe Stockker
Processing & Analysis of Data, Data editing, Benefits of data editing, Data coding, Classification of data, CLASSIFICATION ACCORDING THE ATTRIBUTES, CLASSIFICATION ON THE BASIS OF INTERVAL, TABULATION of data, Types of tables, Graphing of data, Bar chart, Pie chart, Line graph, histogram, Polygon / ogive, Analysis of Data, Descriptive Analysis, Uni-Variate Analysis, Bivariate Analysis, Multi-Variate Analysis, Causal Analysis, Inferential Analysis, PARAMETRIC TESTS, Non parametric Test,
01 parametric and non parametric statisticsVasant Kothari
Definition of Parametric and Non-parametric Statistics
Assumptions of Parametric and Non-parametric Statistics
Assumptions of Parametric Statistics
Assumptions of Non-parametric Statistics
Advantages of Non-parametric Statistics
Disadvantages of Non-parametric Statistical Tests
Parametric Statistical Tests for Different Samples
Parametric Statistical Measures for Calculating the Difference Between Means
Significance of Difference Between the Means of Two Independent Large and
Small Samples
Significance of the Difference Between the Means of Two Dependent Samples
Significance of the Difference Between the Means of Three or More Samples
Parametric Statistics Measures Related to Pearson’s ‘r’
Non-parametric Tests Used for Inference
FutureBioinformatics and Optimization tools for sustainable development.pptxPriyanshuYadav365563
This presentation discusses the importance of bioinformatics and optimization tools in achieving sustainable development. It covers the definition and applications of bioinformatics and optimization tools, the concept of sustainable development, and how these tools can aid in sustainable development practices. The presentation also explores the future of these tools in the context of sustainable development and highlights their potential applications and benefits.
Correlation and regression.
It shows different aspects of Correlation and regression.
A small comparison of these two is also listed in this presentation.
Research methodology - Analysis of DataThe Stockker
Processing & Analysis of Data, Data editing, Benefits of data editing, Data coding, Classification of data, CLASSIFICATION ACCORDING THE ATTRIBUTES, CLASSIFICATION ON THE BASIS OF INTERVAL, TABULATION of data, Types of tables, Graphing of data, Bar chart, Pie chart, Line graph, histogram, Polygon / ogive, Analysis of Data, Descriptive Analysis, Uni-Variate Analysis, Bivariate Analysis, Multi-Variate Analysis, Causal Analysis, Inferential Analysis, PARAMETRIC TESTS, Non parametric Test,
01 parametric and non parametric statisticsVasant Kothari
Definition of Parametric and Non-parametric Statistics
Assumptions of Parametric and Non-parametric Statistics
Assumptions of Parametric Statistics
Assumptions of Non-parametric Statistics
Advantages of Non-parametric Statistics
Disadvantages of Non-parametric Statistical Tests
Parametric Statistical Tests for Different Samples
Parametric Statistical Measures for Calculating the Difference Between Means
Significance of Difference Between the Means of Two Independent Large and
Small Samples
Significance of the Difference Between the Means of Two Dependent Samples
Significance of the Difference Between the Means of Three or More Samples
Parametric Statistics Measures Related to Pearson’s ‘r’
Non-parametric Tests Used for Inference
FutureBioinformatics and Optimization tools for sustainable development.pptxPriyanshuYadav365563
This presentation discusses the importance of bioinformatics and optimization tools in achieving sustainable development. It covers the definition and applications of bioinformatics and optimization tools, the concept of sustainable development, and how these tools can aid in sustainable development practices. The presentation also explores the future of these tools in the context of sustainable development and highlights their potential applications and benefits.
Conduct title screening for systemic review- using Endnote Covidence – Pubric...Pubrica
Title screening process
Title screening overview
How do I screen?
Endnote overview:
Covidence overview:
Continue Reading: https://bit.ly/3AeFIYY
For our services: https://pubrica.com/services/research-services/systematic-review/
Why Pubrica:
When you order our services, We promise you the following – Plagiarism free | always on Time | 24*7 customer support | Written to international Standard | Unlimited Revisions support | Medical writing Expert | Publication Support | Biostatistical experts | High-quality Subject Matter Experts.
Contact us:
Web: https://pubrica.com/
Blog: https://pubrica.com/academy/
Email: sales@pubrica.com
WhatsApp : +91 9884350006
United Kingdom: +44-1618186353
Systematic Review Workflows and Semantic Solutions for Integrating Biological...Michelle Angrish
You tube video available: https://www.youtube.com/channel/UCrTXH6Yh-djmbmoluzgI_2w
Presentation describing how systematic review workflows, evidence maps, and semantics can be used to explore and evidence base and prioritize information for answering science questions.
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...rahulmonikasharma
Classification problems in high dimensional information with little sort of observations became furthercommon significantly in microarray information. The increasing amount of text data on internet sites affects the agglomerationanalysis. The text agglomeration could also be a positive analysis technique used for partitioning a huge amount of datainto clusters. Hence, the most necessary draw back that affects the text agglomeration technique is that the presenceuninformative and distributed choices in text documents. A broad class of boosting algorithms is known as actingcoordinate-wise gradient descent to attenuate some potential performs of the margins of a data set. This paperproposes a novel analysis live Q-statistic that comes with the soundness of the chosen feature set to boot to theprediction accuracy. Then we've a bent to propose the Booster of associate degree FS algorithm that enhances theworth of the Q-statistic of the algorithm applied.
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...OECD Environment
Humans and the environment are exposed every day to chemicals. How do we make sure that these chemicals are safe?
Industry is required to test these chemicals to understand how they may affect people and the environment. In the past, these tests were most commonly carried out on animals. As scientific methods and tools progress, the use of animals to test a product designed for humans are becoming obsolete, in addition to being unethical. With new methods being developed, it is possible to perform these tests on human and animal cell cultures with equally rigorous and robust results. Because the OECD is committed to chemical safety and animal welfare, a new ground-breaking Guideline on Defined Approaches for Skin Sensitisation (OECD GL 497: https://doi.org/10.1787/b92879a4-en) was released on 14 June 2021. It is the first ever Guideline that uses non-animal methods to predict whether a chemical can cause skin allergies.
The OECD organised a webinar on 18 October 2021 at 14:00 to discuss the implementation of the Defined Approaches on Skin Sensitisation for chemical safety in member countries. This webinar paved the way for companies and authorities to determine the environmental toxicity of chemicals without having to resort to animal testing.
Speakers:
Nicole Kleinestreuer: NTP Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM)
Silvia Casati: European Union Reference Laboratory for alternatives to animal testing (EURL ECVAM)
Anna Lowit: U.S. Environmental Protection's Office of Pesticide Programs (US EPA OPP)
Paul Brown: U.S. Food and Drug Administration (US FDA)
Laura Rossi: European Chemicals Agency (ECHA)
Andre Muller: National Institute for Public Health and the Environment (RIVM)
Access the video replay and more information about our work at: https://oe.cd/testing-assessment-webinars
Enhanced Detection System for Trust Aware P2P Communication NetworksEditor IJCATR
Botnet is a number of computers that have been set up to forward transmissions to other computers unknowingly to the user
of the system and it is most significant to detect the botnets. However, peer-to-peer (P2P) structured botnets are very difficult to detect
because, it doesn’t have any centralized server. In this paper, we deliver an infrastructure of P2P that will improve the trust of the peers
and its data. In order to identify the botnets we provide a technique called data provenance integrity. It will ensure the correct origin or
source of information and prevents opponents from using host resources. A reputation based trust model is used for selecting the
trusted peer. In this model, each peer has a reputation value which is calculated based on its past activity. Here a hash table is used for
efficient file searching and data stored in it is based on the reputation value.
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different se
t of problems for
diabetic
patient’s
data
.
The
research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48
,
J48 Graft, Random tree, REP, LAD. Here used to compare the
performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and
to find the error rate measurement for different classifiers in
weka .In this paper the
data
classification is diabetic patients data set is develope
d by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine t
ests.
Weka tool is used to classify the data is evaluated using 10 fold cross validat
ion and the results are compared. When the
performance of algorithms
,
we found J48 is better algorithm in most of the cases
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different set of problems for diabetic patient’s data. The research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48,
J48 Graft, Random tree, REP, LAD. Here used to compare the performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and to find the error rate measurement for different classifiers in
weka .In this paper the data classification is diabetic patients data set is developed by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine tests.
Weka tool is used to classify the data is evaluated using 10 fold cross validation and the results are compared. When the
performance of algorithms, we found J48 is better algorithm in most of the cases.
My presentation at the http://neuroinformatics2017.org (Kuala Lumpur, Malaysia) on FAIR and FAIRsharing (previously BioSharing); metadata standards and their implementation by databases/repositories and adoption by journals' and funders' data policies.
Ten basic guidelines for conducting and publishing a meta-analysis.pptxPubrica
To systematically search published studies, use various bibliographic databases like PubMed, Embase, The Cochrane Central Register of Controlled Trials, Scopus, Web of Science, and Google Scholar. Specific databases like BIOSIS, CINAHL, PsycINFO, Sociological Abstracts, and EconLit can help identify additional articles and data.
Read more @ https://pubrica.com/academy/meta-analysis/ten-basic-guidelines-for-conducting-and-publishing-a-meta-analysis/
Data science is likely to become even more important as the volume and complexity of data continues to increase. With advancements in machine learning and artificial intelligence, data scientists will have access to more sophisticated tools and algorithms to analyze and extract insights from data. Data science will continue to play a crucial role in fields such as healthcare, finance, and technology, helping organizations make better decisions and drive innovation. Additionally, there will be a greater emphasis on data privacy and ethical considerations as the use of data becomes more prevalent.
#Data science is a field that involves using statistical and computational methods to analyze and extract insights from data. It plays a crucial role in various industries, from business and healthcare to finance and technology.
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Setia Pramana
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA and RNA sequence
Guess Lecture at Computer Science Department, IPB, Bogor
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity NumberSetia Pramana
“Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number” oleh Bapak Janto Marzuki (Former Database Manager Ericsson Sweden), Diskusi PPI Stockholm dan SIS: “Single Identification Number in Sweden, its impact on social life and research”
Research possibilities with the Personal Identification Number (person nummer...Setia Pramana
Research possibilities with the Personal Identification Number (person nummer) in Sweden by Prof. Marie Reilly, given at discussion on Single Identification Number in Sweden, its impact on social life and research, Stockholm, September 22, 2013
Developing R Graphical User Interfaces, presented at
1. Workshop on Development of R software for data analysis, Hasselt University, Belgium, March 13th, 2013.
2. Joint Seminar, Medical Epidemiology and Biostatistics Department, Karolinska Institutet, April 4th, 2013.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2. Course Outline
Introduction
◦ Overview of Multivariate data analysis
◦ The applications
Matrix Algebra And Random Vectors
Sample Geometry
Multivariate Normal Distribution
Inference About A Mean Vector
Comparison Several Mean Vectors
Setia Pramana SURVIVAL DATA ANALYSIS 2
3. Course Outline
Principal Component Analysis
Factor Analysis
Cluster Analysis
Discriminant Analysis
Canonical Correlations
Setia Pramana SURVIVAL DATA ANALYSIS 3
4. Course Workload
40% Theory, 60% practice
Group Project (4 students)
Group Presentation in ENGLISH every week
Software used is mainly R, others are allowed
R code would be provided
Slides can be seen at : http://www.slideshare.net/hafidztio/
Setia Pramana SURVIVAL DATA ANALYSIS 4
10. What is Multivariate?
Univariate Analysis?
Some describe it as: any statistical technique used to analyze data
that arises from more than one variable
Multivariable vs. Multivariate Analysis
http://www.youtube.com/watch?v=KhA_PCMPZZo
13. What is Multivariate Data Analysis?
The statistical analysis of the data collected on more than one
(response) variable.
We want to analyze them simultaneously
The variables may be correlated with each other
The dependence is taken into account
More complex univariate analysis
In the real world, most data are multivariate data
Basic Statistical Analysis for Data Mining
14. Types of MVA
Exploratory Data Analysis (EDA): Sometimes called data mining this area is useful for gaining
deeper insights into large, complex data sets.
Regression analysis: Develops models to predict new and future events. Is useful for predictive
analytics applications.
Classification for identifying new or existing classes: This area is useful in research,
development, market analysis, etc.
15. MVD objectives
1. Data reduction or structural simplification. To simplify without
loosing any valuable information and make interpretation easier.
2. Sorting and grouping. Similar objects or variables are grouped,
based upon the characteristics. Define rules for classifying objects
into well-defined groups.
3. Investigation of the dependence among variables. The nature of
the relationships among variables is of interest. Are all the
variables mutually dependent/ independent?
16. MVD objectives
4. Prediction. Relationships between variables must be
determined for the purpose of predicting the values of one or
more variables on the basis of observations on the other
variables.
5. Hypothesis construction and testing. Specific statistical
hypotheses, formulated are tested.
19. Applications
Petrochemical and refining operations, including early fault detection and
gasoline blending and optimisation
Food and beverage applications, particularly for consumer segmentation and
new product development
Agricultural analysis, including real-time analysis of protein and moisture in
wheat, barley and other crops
Business Intelligence and marketing for predicting changes in dynamic markets
or better product placement
Oil and gas and mining, including analysis of machinery performance and
locating new sources of commodities
20. Applications
Data reduction or simplification
Using data on several variables related to cancer patient responses to
radiotherapy, a simple measure of patient response to radiotherapy was
constructed.
Multispectral image data collected by a high-altitude scanner were reduced to a
form that could be viewed as images (pictures) of a shoreline in two dimensions.
Data on several variables relating to yield and protein content were used to
create an index to select parents of subsequent generations of improved bean
plants.
21. Applications
Sorting and grouping
• Data on several variables related to computer use were employed to create
clusters of categories of computer jobs that allow a better determination of
existing (or planned) computer utilization.
• Measurements of several physiological variables were used to develop a
screening procedure that discriminates alcoholics from nonalcoholics.
• Data related to responses to visual stimuli were used to develop a rule for
separating people suffering from a multiple-sclerosis-caused visual pathology
from those not suffering from the disease.
22. Applications
Investigation of the dependence among variables
• Data on several variables were used to identify factors that were responsible
for client success in hiring external consultants.
• Measurements of variables related to innovation, and variables related to the
business environment and business organization, on the other hand, were used
to discover why some firms are product innovators and some firms are not.
• Measurements of pulp fiber characteristics and subsequent measurements of
characteristics of the paper made from them are used to examine the relations
between pulp fiber properties and the resulting paper properties. The goal is to
determine those fibers that lead to higher quality paper.
23. Applications
Prediction
• The associations between test scores, and several high school performance variables,
and several college performance variables were used to develop predictors of success in
college.
• Data on several variables related to the size distribution of sediments were used to
develop rules for predicting different depositional environments.
• Measurements on several accounting and financial variables were used to develop a
method for identifying potentially insolvent property-liability insurers.
• cDNA microarray experiments (gene expression data) are increasingly used to study
the molecular variations among cancer tumors. A reliable classification of tumors is
essential for successful diagnosis and treatment of cancer.
24. Applications
Hypotheses testing
• Several pollution-related variables were measured to determine whether
levels for a large metropolitan area were roughly constant throughout the week,
or whether there was a noticeable difference between weekdays and weekends.
• Experimental data on several variables were used to see whether the nature of
the instructions makes any difference in perceived risks, as quantified by test
scores.
• Data on many variables were used to investigate the differences in structure of
American occupations to determine the support for one of two competing
sociological theories.
25. Other Applications?
In Group, discuss multivariate data on:
1. Biomedical
2. Economic
3. Government Policy
4. Health
5. Social
6. Demography
7. Business
8. Telecommunication
9. Education
10. Psychology