This document provides an overview of multivariate statistical techniques that can be used in agriculture and plant science research. It discusses multiple linear regression analysis, which models the relationship between a dependent variable and one or more explanatory variables. The document explains how to determine regression coefficients and test their significance using analysis of variance. It also describes different variable selection techniques for multiple regression like backward elimination, forward selection, and stepwise regression. The goal is to help researchers identify the best predictive model and determine which variables are most important when the number of predictors increases.
This document discusses data analysis and various techniques used in data analysis such as data editing, coding, classification, tabulation, and statistical analysis. It describes different types of statistical tests like z-test, t-test, chi-square test, and their uses. It also discusses various types of tables, diagrams, and graphical representations that are used to present statistical data in a meaningful way. Key types of diagrams mentioned include bar charts, pie charts, histograms and scatter plots. Rules for properly constructing tables and graphs are also provided.
Confirmatory factor analysis (CFA) is a statistical technique used to test whether measures of a construct are consistent with a researcher's understanding of that construct. CFA can be used to confirm or reject a measurement theory by specifying the number of factors and which measured variables relate to which latent variables, unlike exploratory factor analysis. Assumptions of CFA include multivariate normality, sufficient sample size, correct model specification, and a random sample. Statistical software like AMOS, LISREL, EQS, and SAS can be used to conduct CFA.
This document provides an overview of methods for data analysis. It discusses data, descriptive statistics such as measures of central tendency and dispersion, inferential statistics including hypothesis testing and probability, and statistical software packages with a focus on SPSS. SPSS allows users to easily input, manage, and analyze data to obtain summary statistics and perform inferential analyses like t-tests, ANOVA, and regression. Outputs can be copied into reports.
1. The document provides an overview of statistical analysis including the scientific method, common statistical terminology, hypotheses testing, choosing an appropriate statistical method, the normal distribution, and significance and confidence limits.
2. It explains key concepts like the null hypothesis, which is the opposite of the research hypothesis and is disproven through statistical testing.
3. Statistical methods depend on factors like the type of test needed, sample size, and data type, and whether tests of association, difference, or other analyses are required.
Basics of Educational Statistics (T-test)HennaAnsari
A t-test is a statistical test used to compare the means of two groups and determine if there is a significant difference between them. It can be used for hypothesis testing to see if a treatment has an effect. There are assumptions that the data is independent, normally distributed, and has similar variances within each group. Different types of t-tests exist depending on the type of data, such as whether the groups are related or independent samples. The t-distribution table provides probabilities for assessing the significance of t-test results.
This document discusses data analysis and various techniques used in data analysis such as data editing, coding, classification, tabulation, and statistical analysis. It describes different types of statistical tests like z-test, t-test, chi-square test, and their uses. It also discusses various types of tables, diagrams, and graphical representations that are used to present statistical data in a meaningful way. Key types of diagrams mentioned include bar charts, pie charts, histograms and scatter plots. Rules for properly constructing tables and graphs are also provided.
Confirmatory factor analysis (CFA) is a statistical technique used to test whether measures of a construct are consistent with a researcher's understanding of that construct. CFA can be used to confirm or reject a measurement theory by specifying the number of factors and which measured variables relate to which latent variables, unlike exploratory factor analysis. Assumptions of CFA include multivariate normality, sufficient sample size, correct model specification, and a random sample. Statistical software like AMOS, LISREL, EQS, and SAS can be used to conduct CFA.
This document provides an overview of methods for data analysis. It discusses data, descriptive statistics such as measures of central tendency and dispersion, inferential statistics including hypothesis testing and probability, and statistical software packages with a focus on SPSS. SPSS allows users to easily input, manage, and analyze data to obtain summary statistics and perform inferential analyses like t-tests, ANOVA, and regression. Outputs can be copied into reports.
1. The document provides an overview of statistical analysis including the scientific method, common statistical terminology, hypotheses testing, choosing an appropriate statistical method, the normal distribution, and significance and confidence limits.
2. It explains key concepts like the null hypothesis, which is the opposite of the research hypothesis and is disproven through statistical testing.
3. Statistical methods depend on factors like the type of test needed, sample size, and data type, and whether tests of association, difference, or other analyses are required.
Basics of Educational Statistics (T-test)HennaAnsari
A t-test is a statistical test used to compare the means of two groups and determine if there is a significant difference between them. It can be used for hypothesis testing to see if a treatment has an effect. There are assumptions that the data is independent, normally distributed, and has similar variances within each group. Different types of t-tests exist depending on the type of data, such as whether the groups are related or independent samples. The t-distribution table provides probabilities for assessing the significance of t-test results.
This document discusses factors that influence the selection of data analysis strategies and provides a classification of statistical techniques. It notes that the previous research steps, known data characteristics, statistical technique properties, and researcher background all impact strategy selection. Statistical techniques can be univariate, analyzing single variables, or multivariate, analyzing relationships between multiple variables simultaneously. Multivariate techniques are further classified as dependence techniques, with identifiable dependent and independent variables, or interdependence techniques examining whole variable sets. The document provides examples of common univariate and multivariate techniques.
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYkeerthana151
This document provides an overview of statistical concepts related to analytical chemistry. It defines key terms like error, bias, accuracy, and precision. It discusses measures of central tendency, statistical process control charts, and various statistical tests. It provides examples of calculating Kjeldahl nitrogen and describes different types of control charts and statistical tests like t-tests, F-tests, linear regression, and analysis of variance. It lists several references for further information on statistics topics.
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettySundar B N
This document discusses different types of statistical analysis used to analyze data. Univariate analysis examines one variable at a time through methods like frequency distributions, histograms, and pie charts. Bivariate analysis considers the relationship between two variables, such as income and weight. Multivariate analysis studies three or more variables simultaneously, with applications in fields like social science, climatology, and medicine.
This document provides an overview of statistics and biostatistics. It defines statistics as the collection, analysis, and interpretation of quantitative data. Biostatistics refers to applying statistical methods to biological and medical problems. Descriptive statistics are used to summarize and organize data, while inferential statistics allow generalization from samples to populations. Common statistical measures include the mean, median, and mode for central tendency, and range, standard deviation, and variance for variability. Correlation analysis examines relationships between two variables. The document discusses various data types and measurement scales used in statistics. Overall, it serves as a basic introduction to key statistical concepts for research.
This document provides an introduction to biostatistics. It discusses how statistics are important for precision in science and medicine. Biostatistics involves applying statistical tools to biological data from fields like medicine. Some key applications of biostatistics include defining normal ranges, comparing treatment effectiveness, and identifying disease associations. The document also outlines common statistical terms, data sources and types, methods for presenting data, measures of central tendency and variability.
This document discusses statistical analysis using SPSS. It describes descriptive statistics, which present data in a usable form by describing frequency, central tendency, and dispersion. Inferential statistics make broader generalizations from samples to populations using hypothesis testing. Hypothesis testing involves research hypotheses, null hypotheses, levels of significance, and type I and II errors. Choosing an appropriate statistical test depends on the hypothesis and measurement levels of the variables. SPSS is a comprehensive system for statistical analysis that can analyze many file types and generate reports and statistics.
This document provides an overview of key concepts for analyzing medical data from a research perspective, including:
- Statistical concepts important for medical licensing exams like scales of measurement, distributions, hypothesis testing, and study designs.
- How to determine what data is available to answer a clinical question, locate existing datasets, and analyze/interpret findings using software like Excel and SPSS.
- Resources for further learning about epidemiology, health statistics, diagnostic tests, and using statistical software.
The document discusses various methods for analyzing data, including descriptive, statistical, and multivariate analyses. Statistical analysis makes raw data meaningful by testing hypotheses, obtaining significant results, and drawing inferences. The appropriate analysis depends on the type of measurement, number of variables, and type of statistical inference required. Correlation analysis studies relationships between variables while causal analysis examines how independent variables affect dependents. Multivariate techniques include multiple regression, discriminant analysis, ANOVA, and canonical analysis.
This document discusses various statistical analysis techniques used in marketing research. It begins by explaining how to bring raw data into order through arrays, tabulations and establishing categories. It then discusses descriptive, inferential, differences, associative and predictive analysis. The document also covers univariate techniques like t-tests, z-tests, ANOVA, chi-square tests and multivariate techniques like regression, conjoint analysis and cluster analysis. It provides guidance on when to use specific statistical tests and covers statistics used in cross-tabulation like phi coefficient, contingency coefficient and Cramer's V.
Research methodology - Analysis of DataThe Stockker
Processing & Analysis of Data, Data editing, Benefits of data editing, Data coding, Classification of data, CLASSIFICATION ACCORDING THE ATTRIBUTES, CLASSIFICATION ON THE BASIS OF INTERVAL, TABULATION of data, Types of tables, Graphing of data, Bar chart, Pie chart, Line graph, histogram, Polygon / ogive, Analysis of Data, Descriptive Analysis, Uni-Variate Analysis, Bivariate Analysis, Multi-Variate Analysis, Causal Analysis, Inferential Analysis, PARAMETRIC TESTS, Non parametric Test,
This document provides an overview of various quantitative data analysis techniques including parametric and non-parametric statistics, descriptive statistics, contingency analysis, t-tests, ANOVA, correlation, and regression. It discusses assumptions and processes for each technique and how to interpret results. Computer software like SPSS and SAS can be used to analyze large, complex datasets.
The document provides an overview of data analysis methods and concepts for graduate fellows. It covers:
1) The objectives of translating research questions into an analysis plan, identifying appropriate data analysis methods and software, and conducting exploratory analysis.
2) Key concepts in data analysis including response and explanatory variables, multi-level data structures, and exploratory versus confirmatory analysis.
3) Guidance on specific exploratory analysis methods and examples of confirmatory analysis options using different statistical models depending on variable types.
This document provides an overview of quantitative analysis techniques using SPSS, including data manipulation, transformation, and cleaning methods. It also covers univariate, bivariate, and other statistical analysis methods for exploring relationships between variables and differences between groups. Specific techniques discussed include computing new variables, recoding, selecting cases, imputing missing values, aggregating data, sorting, merging files, descriptive statistics, correlations, regressions, t-tests, ANOVA, non-parametric tests, and more.
Univariate analysis examines one variable at a time across a sample. There are three main tools used in univariate analysis: distribution of frequency, measures of central tendency (mean, median, mode), and measures of dispersion. Distribution examines individual values, range, and charts. Central tendency measures the average or middle value. Dispersion measures the spread around the central tendency, such as the standard deviation and range. Common univariate analysis procedures include frequencies, descriptives, and explore in SPSS.
Parametric and non parametric test in biostatistics Mero Eye
This ppt will helpful for optometrist where and when to use biostatistic formula along with different examples
- it contains all test on parametric or non-parametric test
Statistics is important in chemistry for collecting, analyzing, and presenting quantitative data. It is used in analytical chemistry to detect, identify, and measure unknown chemical compositions using instrumentation techniques. Statistical methods like descriptive statistics are used to summarize sample data using measures like the mean and standard deviation, while inferential statistics draw conclusions from data subject to random variation. Statistics plays a vital role in chemistry research by guiding data collection, interpretation, and presentation to properly characterize and summarize results. It is especially useful for drawing reliable conclusions in chemistry research.
Non-parametric statistics is a branch of statistics that does not require data to be normally distributed. It can be used with ordinal or ranked data and does not assume a particular distribution shape or require parameters like the mean or standard deviation. Common non-parametric tests include rank sum tests like the Wilcoxon-Mann-Whitney U test and the Kruskal-Wallis H test, the chi-square test, and Spearman's rank correlation test. These tests make fewer assumptions about the underlying data distribution compared to parametric tests.
This document discusses how to analyze data and perform various statistical tests using SPSS software. It explains how to open data files, enter data, and access the SPSS data editor window. It then covers determining descriptive statistics like frequencies, means, and medians. Finally, it demonstrates how to conduct t-tests, ANOVA, correlation analysis, linear regression, and create scatter plots in SPSS.
This document discusses multivariate analysis and some key concepts in multivariate analysis including:
1. Variates, measurement scales (metric and non-metric), measurement error, statistical significance versus statistical power.
2. Types of measurement scales including nominal, ordinal, interval and ratio scales.
3. Measurement error and how it relates to validity and reliability in multivariate measurement.
4. Statistical significance and types of statistical errors in multivariate analysis.
The effects of worked examples on transfer of statistical reasoningMarianna Lamnina
An experiment compared the effectiveness of worked examples versus reading a textbook on learning statistical reasoning. Participants completed a pre-test, then the experimental group received a computer tutorial with worked examples and practice problems while the control group read textbook passages. On the post-test, the experimental group scored significantly higher on average than the control group, demonstrating that worked examples facilitated greater learning gains compared to traditional reading.
This document summarizes the key differences between probability sampling and non-probability (quota) sampling in sample surveys. Probability sampling involves randomly selecting samples so that all units have a known chance of selection, allowing results to be generalized to the population. Quota sampling matches sample quotas to population characteristics but involves subjective judgment, preventing determination of selection probabilities. Probability sampling provides unbiased results and a measure of sampling error, while quota sampling relies on untestable models and cannot estimate precision. While quota sampling may be less costly, probability sampling is preferred by statistical agencies for its objectively verifiable quality.
This document discusses factors that influence the selection of data analysis strategies and provides a classification of statistical techniques. It notes that the previous research steps, known data characteristics, statistical technique properties, and researcher background all impact strategy selection. Statistical techniques can be univariate, analyzing single variables, or multivariate, analyzing relationships between multiple variables simultaneously. Multivariate techniques are further classified as dependence techniques, with identifiable dependent and independent variables, or interdependence techniques examining whole variable sets. The document provides examples of common univariate and multivariate techniques.
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYkeerthana151
This document provides an overview of statistical concepts related to analytical chemistry. It defines key terms like error, bias, accuracy, and precision. It discusses measures of central tendency, statistical process control charts, and various statistical tests. It provides examples of calculating Kjeldahl nitrogen and describes different types of control charts and statistical tests like t-tests, F-tests, linear regression, and analysis of variance. It lists several references for further information on statistics topics.
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettySundar B N
This document discusses different types of statistical analysis used to analyze data. Univariate analysis examines one variable at a time through methods like frequency distributions, histograms, and pie charts. Bivariate analysis considers the relationship between two variables, such as income and weight. Multivariate analysis studies three or more variables simultaneously, with applications in fields like social science, climatology, and medicine.
This document provides an overview of statistics and biostatistics. It defines statistics as the collection, analysis, and interpretation of quantitative data. Biostatistics refers to applying statistical methods to biological and medical problems. Descriptive statistics are used to summarize and organize data, while inferential statistics allow generalization from samples to populations. Common statistical measures include the mean, median, and mode for central tendency, and range, standard deviation, and variance for variability. Correlation analysis examines relationships between two variables. The document discusses various data types and measurement scales used in statistics. Overall, it serves as a basic introduction to key statistical concepts for research.
This document provides an introduction to biostatistics. It discusses how statistics are important for precision in science and medicine. Biostatistics involves applying statistical tools to biological data from fields like medicine. Some key applications of biostatistics include defining normal ranges, comparing treatment effectiveness, and identifying disease associations. The document also outlines common statistical terms, data sources and types, methods for presenting data, measures of central tendency and variability.
This document discusses statistical analysis using SPSS. It describes descriptive statistics, which present data in a usable form by describing frequency, central tendency, and dispersion. Inferential statistics make broader generalizations from samples to populations using hypothesis testing. Hypothesis testing involves research hypotheses, null hypotheses, levels of significance, and type I and II errors. Choosing an appropriate statistical test depends on the hypothesis and measurement levels of the variables. SPSS is a comprehensive system for statistical analysis that can analyze many file types and generate reports and statistics.
This document provides an overview of key concepts for analyzing medical data from a research perspective, including:
- Statistical concepts important for medical licensing exams like scales of measurement, distributions, hypothesis testing, and study designs.
- How to determine what data is available to answer a clinical question, locate existing datasets, and analyze/interpret findings using software like Excel and SPSS.
- Resources for further learning about epidemiology, health statistics, diagnostic tests, and using statistical software.
The document discusses various methods for analyzing data, including descriptive, statistical, and multivariate analyses. Statistical analysis makes raw data meaningful by testing hypotheses, obtaining significant results, and drawing inferences. The appropriate analysis depends on the type of measurement, number of variables, and type of statistical inference required. Correlation analysis studies relationships between variables while causal analysis examines how independent variables affect dependents. Multivariate techniques include multiple regression, discriminant analysis, ANOVA, and canonical analysis.
This document discusses various statistical analysis techniques used in marketing research. It begins by explaining how to bring raw data into order through arrays, tabulations and establishing categories. It then discusses descriptive, inferential, differences, associative and predictive analysis. The document also covers univariate techniques like t-tests, z-tests, ANOVA, chi-square tests and multivariate techniques like regression, conjoint analysis and cluster analysis. It provides guidance on when to use specific statistical tests and covers statistics used in cross-tabulation like phi coefficient, contingency coefficient and Cramer's V.
Research methodology - Analysis of DataThe Stockker
Processing & Analysis of Data, Data editing, Benefits of data editing, Data coding, Classification of data, CLASSIFICATION ACCORDING THE ATTRIBUTES, CLASSIFICATION ON THE BASIS OF INTERVAL, TABULATION of data, Types of tables, Graphing of data, Bar chart, Pie chart, Line graph, histogram, Polygon / ogive, Analysis of Data, Descriptive Analysis, Uni-Variate Analysis, Bivariate Analysis, Multi-Variate Analysis, Causal Analysis, Inferential Analysis, PARAMETRIC TESTS, Non parametric Test,
This document provides an overview of various quantitative data analysis techniques including parametric and non-parametric statistics, descriptive statistics, contingency analysis, t-tests, ANOVA, correlation, and regression. It discusses assumptions and processes for each technique and how to interpret results. Computer software like SPSS and SAS can be used to analyze large, complex datasets.
The document provides an overview of data analysis methods and concepts for graduate fellows. It covers:
1) The objectives of translating research questions into an analysis plan, identifying appropriate data analysis methods and software, and conducting exploratory analysis.
2) Key concepts in data analysis including response and explanatory variables, multi-level data structures, and exploratory versus confirmatory analysis.
3) Guidance on specific exploratory analysis methods and examples of confirmatory analysis options using different statistical models depending on variable types.
This document provides an overview of quantitative analysis techniques using SPSS, including data manipulation, transformation, and cleaning methods. It also covers univariate, bivariate, and other statistical analysis methods for exploring relationships between variables and differences between groups. Specific techniques discussed include computing new variables, recoding, selecting cases, imputing missing values, aggregating data, sorting, merging files, descriptive statistics, correlations, regressions, t-tests, ANOVA, non-parametric tests, and more.
Univariate analysis examines one variable at a time across a sample. There are three main tools used in univariate analysis: distribution of frequency, measures of central tendency (mean, median, mode), and measures of dispersion. Distribution examines individual values, range, and charts. Central tendency measures the average or middle value. Dispersion measures the spread around the central tendency, such as the standard deviation and range. Common univariate analysis procedures include frequencies, descriptives, and explore in SPSS.
Parametric and non parametric test in biostatistics Mero Eye
This ppt will helpful for optometrist where and when to use biostatistic formula along with different examples
- it contains all test on parametric or non-parametric test
Statistics is important in chemistry for collecting, analyzing, and presenting quantitative data. It is used in analytical chemistry to detect, identify, and measure unknown chemical compositions using instrumentation techniques. Statistical methods like descriptive statistics are used to summarize sample data using measures like the mean and standard deviation, while inferential statistics draw conclusions from data subject to random variation. Statistics plays a vital role in chemistry research by guiding data collection, interpretation, and presentation to properly characterize and summarize results. It is especially useful for drawing reliable conclusions in chemistry research.
Non-parametric statistics is a branch of statistics that does not require data to be normally distributed. It can be used with ordinal or ranked data and does not assume a particular distribution shape or require parameters like the mean or standard deviation. Common non-parametric tests include rank sum tests like the Wilcoxon-Mann-Whitney U test and the Kruskal-Wallis H test, the chi-square test, and Spearman's rank correlation test. These tests make fewer assumptions about the underlying data distribution compared to parametric tests.
This document discusses how to analyze data and perform various statistical tests using SPSS software. It explains how to open data files, enter data, and access the SPSS data editor window. It then covers determining descriptive statistics like frequencies, means, and medians. Finally, it demonstrates how to conduct t-tests, ANOVA, correlation analysis, linear regression, and create scatter plots in SPSS.
This document discusses multivariate analysis and some key concepts in multivariate analysis including:
1. Variates, measurement scales (metric and non-metric), measurement error, statistical significance versus statistical power.
2. Types of measurement scales including nominal, ordinal, interval and ratio scales.
3. Measurement error and how it relates to validity and reliability in multivariate measurement.
4. Statistical significance and types of statistical errors in multivariate analysis.
The effects of worked examples on transfer of statistical reasoningMarianna Lamnina
An experiment compared the effectiveness of worked examples versus reading a textbook on learning statistical reasoning. Participants completed a pre-test, then the experimental group received a computer tutorial with worked examples and practice problems while the control group read textbook passages. On the post-test, the experimental group scored significantly higher on average than the control group, demonstrating that worked examples facilitated greater learning gains compared to traditional reading.
This document summarizes the key differences between probability sampling and non-probability (quota) sampling in sample surveys. Probability sampling involves randomly selecting samples so that all units have a known chance of selection, allowing results to be generalized to the population. Quota sampling matches sample quotas to population characteristics but involves subjective judgment, preventing determination of selection probabilities. Probability sampling provides unbiased results and a measure of sampling error, while quota sampling relies on untestable models and cannot estimate precision. While quota sampling may be less costly, probability sampling is preferred by statistical agencies for its objectively verifiable quality.
PROBABILITY DISTRIBUTION OF SUM OF TWO CONTINUOUS VARIABLES AND CONVOLUTIONJournal For Research
All physical subjects, involving random phenomena, something depending upon chance, naturally find their own way to theory of Statistics. Hence there arise relations between the results derived for hose random phenomena in different physical subjects and the concepts of Statistics. Convolution theorem has a variety of applications in field of Fourier transforms and many other situations, but it bears beautiful applications in field of statistics also .Here in this paper authors want to discuss some notions of Electrical Engineering in terms of convolution of some probability distributions.
Ten Important Life Lessons on Nano Medicine Research Paper Taught Usscience journals
Journal of Nano medicine & Nanotechnology is a scholarly open access journal that covers a wide range of themes in this field, including molecular nanotechnology, nano sensors, nano particles, nano drugs, Nano materials, nano biotechnology, nano bio pharmaceutics, nano electronics, and nano robotics. This peer reviewed journal strictly adhered to the standard review process to enhance the quality of the publication.
This document discusses the use of autoregressive integrated moving average (ARIMA) models in statistical analysis beyond just time series data. It provides examples of using ARIMA models with non-temporal data, where the independent variable is something other than time, such as temperature or longitude. Key points include:
1) ARIMA models only require evenly spaced intervals for the independent variable and do not necessarily need time as the variable. Examples of non-temporal ARIMA models are given for white dwarf star populations and the distribution of attorneys.
2) Temperature can act as a "time proxy" for white dwarf stars since temperature and time are monotonically related as the stars cool.
3) ARIM
This is a talk I gave during the third year of my Residency in Internal Medicine at the University of Cincinnati. It goes over the history and evolution of statistical concepts underlying Medical Science and Evidence Based Medicine
A nice summary (from which most of the material after Laplace's time came from) is given in:
http://www.worldscibooks.com/etextbook/4854/4854_chap1.pdf
This document summarizes the growth of coffee shops in India. It discusses how coffee was traditionally less popular than tea in India, but chains like Barista, Café Coffee Day, and Costa Coffee have increased coffee's popularity, especially among youth. Coffee shops have become popular social hubs. Factors driving their growth include India's large youth population, rising incomes, growth of private offices, low startup costs, and availability of franchises. The document reviews research on consumer preferences and behaviors regarding coffee shop brands and purchases. It outlines objectives to compare brands' performance and understand reasons for visits and purchases. Both quantitative and qualitative research methods like surveys and interviews are proposed.
This presentation was intended for employees of Dubai Municipality. It is about how to use SPSS and other statistical data analysis tools like Excel and Minitab in data analysis. The course presented some statistical concepts and definitions.
This document discusses the implications of the World Trade Organization (WTO) on India's agricultural sector. It provides background on GATT, the predecessor to WTO, and explains key aspects of the WTO including its formation, purpose, and differences from GATT. The document then discusses India's large and historically important agricultural industry. It outlines issues such as low productivity and government interventions that impact the sector. Finally, it analyzes India's commitments under the WTO Agreement on Agriculture, including maintaining quantitative restrictions on imports and not providing direct export subsidies.
This document discusses several discrete probability distributions:
1. Binomial distribution - For experiments with a fixed number of trials, two possible outcomes, and constant probability of success. The probability of x successes is given by the binomial formula.
2. Geometric distribution - For experiments repeated until the first success. The probability of the first success on the xth trial is p(1-p)^(x-1).
3. Poisson distribution - For counting the number of rare, independent events occurring in an interval. The probability of x events is (e^-μ μ^x)/x!, where μ is the mean number of events.
This document discusses the role and importance of statistics in scientific research. It begins by defining statistics as the science of learning from data and communicating uncertainty. Statistics are important for summarizing, analyzing, and drawing inferences from data in research studies. They also allow researchers to effectively present their findings and support their conclusions. The document then describes how statistics are used and are important in many fields of scientific research like biology, economics, physics, and more. It also provides examples of statistical terms commonly used in research studies and some common misuses of statistics.
Modelled and Analysed the watershed Dynamics in Mahanadi River Basin. Finally came up with watershed Management Plan to minimise the future LUCC in Mahanadi River Basin
Advice On Statistical Analysis For Circulation ResearchNancy Ideker
This document provides an overview and review of statistical methods for analyzing cardiovascular research data. It discusses common statistical errors in previous decades, such as low statistical power and inadequate analysis of repeated measures studies. It introduces several statistical methods that are useful but not always familiar to cardiologists, including power analysis, methods for analyzing repeated measures, analysis of covariance, multivariate analysis of variance, nonparametric tests, and more. The goal is to help researchers choose the appropriate statistical tests and properly interpret the results.
This document provides an overview of different types of data analysis including univariate analysis, bivariate analysis, and multivariate analysis. It also discusses different types of data structures such as cross-sectional data, time series data, and panel data.
The key points are:
1) Univariate analysis looks at one variable only to describe patterns in the data. Bivariate analysis looks at the relationship between two variables, while multivariate analysis examines three or more variables.
2) Cross-sectional data collects information from different subjects at the same point in time. Time series data observes the same variable over time. Panel data tracks the same subjects over multiple time periods.
3) Different analysis techniques can be used depending on the
Get your quality homework help now and stand out.Our professional writers are committed to excellence. We have trained the best scholars in different fields of study.Contact us now at http://www.essaysexperts.net/ and place your order at affordable price done within set deadlines.We always have someone online ready to answer all your queries and take your requests.
Statistics and types of statistics .docxHwre Idrees
This document discusses different types of statistics. It defines descriptive statistics as summarizing and describing data, while inferential statistics use samples to make inferences about populations. Measures of central tendency like mean, median and mode are described as well as measures of variability such as range, standard deviation and variance. Specific types of each are defined and explained, such as weighted mean, interquartile range, and harmonic mean. Tables and figures are included to illustrate the differences between descriptive and inferential statistics and examples of various statistical measures.
Statistics is the study of collecting, organizing, summarizing, and interpreting data. Medical statistics applies statistical methods to medical data and research. Biostatistics specifically applies statistical methods to biological data. Statistics is essential for medical research, updating medical knowledge, data management, describing research findings, and evaluating health programs. It allows comparison of populations, risks, treatments, and more.
This document provides an overview of basic statistics concepts and terminology. It discusses descriptive and inferential statistics, measures of central tendency (mean, median, mode), measures of variability, distributions, correlations, outliers, frequencies, t-tests, confidence intervals, research designs, hypotheses testing, and data analysis procedures. Key steps in research like research design, data collection, and statistical analysis are outlined. Descriptive statistics are used to describe data while inferential statistics investigate hypotheses about populations. Common statistical analyses and concepts are also defined.
MELJUN CORTES research lectures_evaluating_data_statistical_treatmentMELJUN CORTES
This document discusses the importance of statistics in research and the proper treatment of data. It notes that statistics are the backbone of research and help organize data in tables and graphs to guide meaningful interpretations. The document outlines the data analysis process and different levels of measurement for variables. It provides a matrix for statistical treatment of different types of data and describes common statistical operations like measures of central tendency, variance, correlation, and statistical tests. Dangers of misusing statistics are also discussed.
Level of Measurement, Frequency Distribution,Stem & Leaf Qasim Raza
This document discusses multivariate data analysis and techniques. It begins by defining qualitative and quantitative data, and the different levels of measurement - nominal, ordinal, interval, and ratio. It then discusses frequency distributions, stem and leaf plots, and demonstrates their use in SPSS. Finally, it defines multivariate data analysis as involving two or more variables, and provides examples of multivariate techniques such as multiple regression, discriminant analysis, MANOVA, and their appropriate uses depending on the level of measurement of the variables.
An Overview and Application of Discriminant Analysis in Data AnalysisIOSR Journals
This document provides an overview of discriminant analysis, including its history, key assumptions, and different types (e.g. linear, quadratic). It discusses advantages of discriminant analysis compared to logistic regression, such as its ability to handle small sample sizes. The document also describes steps to develop a discriminant model, including variable selection, assumptions checking, and evaluation. It then presents an application of discriminant analysis to classify failed vs successful companies in Nigeria based on financial ratios. The model was able to predict company failure up to 3 years in advance.
Multivariate Approaches in Nursing Research Assignment.pdfbkbk37
The document discusses multivariate approaches used in nursing research. It discusses key variables, validity and reliability, threats to internal validity, and strengths and limitations of models used in the selected article. The document also provides an overview of different multivariate techniques including multiple regression analysis, logistic regression analysis, multivariate analysis of variance, factor analysis, and discriminant function analysis. It discusses when each technique is appropriate and how to choose the right method to solve practical problems.
The document discusses basics of statistics including key concepts like population, sample, parameters, and statistics. It provides definitions for population as the collection of all individuals or items under consideration, and sample as the part of the population selected for a study. Parameters describe unknown characteristics of the population, while statistics describe known characteristics of the sample and are used to infer parameters. The document also distinguishes between descriptive statistics, which summarize and organize data, and inferential statistics, which draw conclusions about populations from samples.
This document provides an overview of biostatistics and various statistical concepts used in dental sciences. It discusses measures of central tendency including mean, median, and mode. It also covers measures of dispersion such as range, mean deviation, and standard deviation. The normal distribution curve and properties are explained. Various statistical tests are mentioned including t-test, ANOVA, chi-square test, and their applications in dental research. Steps for testing hypotheses and types of errors are summarized.
- Multinomial logistic regression predicts categorical membership in a dependent variable based on multiple independent variables. It is an extension of binary logistic regression that allows for more than two categories.
- Careful data analysis including checking for outliers and multicollinearity is important. A minimum sample size of 10 cases per independent variable is recommended.
- Multinomial logistic regression does not assume normality, linearity or homoscedasticity like discriminant function analysis does, making it more flexible and commonly used. It does assume independence between dependent variable categories.
This document provides an overview of descriptive statistics, inferential statistics, and regression analysis using PASW Statistics software. It discusses topics such as frequency analysis, measures of central tendency, hypothesis testing, t-tests, ANOVA, chi-square tests, correlation, and linear regression. The document is divided into multiple parts that cover opening and manipulating data files, descriptive statistics, tests of significance, regression analysis, and chi-square/ANOVA. It also discusses importing/exporting data and using scripts in PASW Statistics.
Dimensionality Reduction Techniques In Response Surface Designsinventionjournals
Dimensionality reduction has enormous applications in various fields in industries. It can be applied in an optimal way with respect to time and cost related to the agricultural sciences, mechanical engineering, chemical technology, pharmaceutical sciences, clinical trials, biological studies, image processing, pattern recognitions etc. Several researchers made attempts on the reduction of the size of the model for different specific problems using some mathematical and statistical techniques identifying and eliminating some insignificant variables.This paper presents a review of the available literature on dimensionality reduction
This document provides an outline for a presentation on biostatistics and epidemiology. It covers key principles of using biostatistics in research, including distinguishing different variable types, understanding data distributions, hypothesis testing, statistical tests, measures of association, regression, diagnostic tests, and systematic reviews. Statistical concepts like p-values, confidence intervals, and odds ratios are defined. Examples are provided for statistical tests like t-tests, chi-square tests, survival analysis, and diagnostic test metrics.
This study analyzes normalized citation indexes to understand the impact of Brazilian science across 27 fields from 1996-2007. Three normalization procedures are used: mean area (Ma), median (Md), and mean of the top 10% most productive (Ma10%). Correlations between the normalized indexes show the highest correlation is between Ma and Md, indicating similar behavior. Linear regression models show Md fits better to Ma than Ma10% to Ma, suggesting Md values are slightly higher than Ma over time while Ma10% provides complementary information on scientific impact. Overall, the normalized indexes show Brazil performed above average in most fields during this period.
Similar to applied multivariate statistical techniques in agriculture and plant science 2 (20)
2. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
128
understand the relationships among variables and their relevance to the actual problems being studied
(Johnson and Wicheren, 1996). Many different multivariate analyses techniques such as multivariate
analysis of variance (MANOVA), multiple regression analysis, principal components analysis (PCA), factor
analysis (FA), canonical correlation analysis (CC), and clustering analysis are available. In this review we are
going to explain applying and usable techniques of multivariate statistics in the agriculture and plant science
with related examples in order to provide a practical manual in scientific research works for plant scientist.
Multiple Linear Regression Analysis
Linear regression is an approach to modeling the relationship between a dependent variable called Y
and one or more explanatory variables denoted X. The case of one explanatory variable is called simple
regression. For example we want to determine 1 cm increasing the height of a plant makes how much
change in its yield, in which situation we use simple linear regression (Draper and Smith, 1966). The
prediction model equation for simple linear regression is:
Y=b0 + b1X + ε
b0: It is the intercept that geometrically represents the value of dependent variable (Y) where the regression
line crosses the Y axis. Substantively, it is the expected value of Y when independent variable is equal zero.
b1: Slope coefficient (regression coefficient). It represents the change in Y associated with a one-unit
increase in X.
ε: In most situations, we are not in a position to determine the population parameters directly. Instead, we
must estimate their values from a finite sample from the population and this parameter is the error of the
prediction.
Multiple regression considers more than one explanatory variable (X). For example changing one unit in the
stem height, stem diameter, root length and leaf area caused how many changes in the plant yield.
Prediction model for multiple regression is expanded model of simple linear regression which is showed as
follow:
Y=b0 + b1X1 + b2X2 +…..+ biXi + ε
bi= Partial slope coefficient (also called partial regression coefficient, metric coefficient). It represents the
change in Y associated with a one-unit increase in Xi when all other independent variables are held
constant.
Where b0 is the sample estimate of β0 and bi is the sample estimate of βi, and β's are the parameters from
the whole population in which sampling is conducted.
After determining the intercept and regression coefficients, we have to test them for significance by
doing the analysis of variance (ANOVA). ANOVA determine if regression coefficients that the probable
model calculates should be present in the final model as a predictor or not. Statistical software calculates a
P-value or sig-value for coefficients significance test. If P-value for a coefficient was less than 0.05 (P<0.05),
the coefficient is statistically significant and the related variable should be present in the model as a predictor
but if it was higher than 0.05 (P>0.05), the coefficient is not statistically significant and the related variable
should not to be present as a predictor (Draper and Smith, 1981). Coefficient of determination or R-square
(R
2
) shows that how the model of predictors fits dependent variable or variables. Higher R
2
, higher fit of the
model and higher model goodness. Moreover, significant test for intercept (b0) is similar to regression
coefficients (Kleinbaum et al., 1998).
Significance test of the coefficient and R
2
help researchers to decide what predictor is more
important and must be present in the model. As well as these methods, some other techniques are made up
for determining the best model of predictors. Beside this, when the number of the predictors increase,
usually most of the variables are strongly correlated with each other and it is not necessary to presence all of
these correlated variables in the model and they can use instead of each other (Manly, 2001).
Backward elimination: in this technique, unlike forward selection, all variable are existed in the model and
the less important variables are removed from the model step by step. In the first step, all possible models
with removing each one of the variables considered and which variable having the least mean square will be
removed from the model. In the next steps, this procedure is applied and whenever the P-value will be higher
3. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
129
than standard, the analysis will be stopped and model with remained variables will be the best predicting
model (Burnham and Anderson, 2002).
Forward selection: in this method, for the first step of analysis, all possible simple regression related to
each of the independent variables is calculated and which of the variables that has the highest mean square
(or F-value) is presented as the first and most important predictor in the regression model. In the second
step, variable interred to the model in the first step is exist in the model and all other possible models in
which the first variable is exist must be made up and each one has the most mean square is preferred
prediction model. This procedure will continue until the P-value of the model will be higher than the standard
P-value. In this situation, the remained variables will not to be presented in the prediction model (Harrell,
2001).
Stepwise regression: this variable selection method has proved to be an extremely useful computational
technique in data analysis problems (Dong et al., 2008). Similar to forward selection, in stepwise regression
all possible univariate models are worked out and which variable has the highest mean square is consisted
in the model. In the second step, all other possible models associated with the first consisted variable is
investigated and each variable that has the highest mean square is entered to the model, but when the
second variable entered, first variable should be test for significance in the presence of the second entered
variable. In this situation if the first entered variable is either significant, both variables will be consisted in the
model but if the first variable is not significant, it should be removed from the model. In other steps, this
procedure is repeated and what variable was entered to the prediction model in the previous steps that has
P-value less than the standard will be removed. Indeed this technique use both forward selection and
backward elimination techniques and is more suitable than those alone (Miller, 2002).
Path analysis: regression coefficients strongly are depending on the unit of the variables. Based on the unit
of the variables, the coefficients of the variables are high or low and variables with strong unit has high
coefficient and vice versa. In order to comparing coefficients, the solution is to transform the variables' data
to the standard data by subtracting the mean and dividing to its standard deviation. After standardizing the
variables' data, the variable with higher coefficient has higher effect on the dependent variable. When
independent variables are correlated with each other, the variables can affects each other. In this situation,
the correlation between each independent variable with the dependent variable could be divided into direct
effect of the each independent variable and the indirect effect via other correlated variables (Fig. 1). Using
standardized data in the regression model for calculating regression coefficient gives the direct effect of the
variables. The indirect effect of the variables can be estimated by multiplying each related direct effect to
correlation coefficient between two or more independent variables (Shipley, 1997). Therefore, path analysis
can be explained as an extension technique of the regression model, used to test the fit of the correlation
matrix against two or more causal models which are being compared by the researcher (Dong et al., 2008).
X1
X2
X3 X3
X2
X1
Effect via
Y
Final effect
Figure 1; Diagram of path analysis
4. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
130
For better understanding of regression techniques have been mentioned above, we present an example
here.
Example 1: we had measured some morphological traits of three wheat cultivars consisting of Tiller
numbers/plant, Spike length, Spikelets/spike, Spike weight/plant, Grains/spike, Grain weight/spike, 100-
Grain weight, Total chlorophyll content of flag leaf, Biologic yield/plant, Root weight, Leaves area and grain
yield under for water regimes (Moocheshi et al., 2012). Here we want to evaluate relationship between what
grain yield and its related measured morphological traits using mentioned techniques above.
Multivariate regression
Table 1 shows regression coefficient values, their standard error, t-student value and p-value for
coefficients. Total regression equation based on the results is:
Y=0.5394 - 0.12X1 - 0.02X2 - 0.01X3 + 0.96X4 + 0.01X5 - 0.78X6 - 0.01X7 - 0.004X8 + 0.01X9 + 0.08X10 +
0.001X11
X1= Tiller numbers/plant, X2=Spike length, X3=Spikelets/spike, X4=Spike weight/plant, X5= Grains/spike,
X6= Grain weight/spike, X7= 100-Grain weight, X8= Total chlorophyll content of flag leaf, X9= Biologic
yield/plant, X10= Root weight, X11= Leaves area and Y=Grain yield.
Coefficient of determination or R
2
is equal to 99.2% which is very high, but it is not a real coefficient
of determination because with increasing variables numbers, R
2
will be getting higher. Scientists introduce
adjusted-R
2
instead of R
2
for solving this problem but it is either not a completely accepted index. Also, as
you can see, in this situation that number of variables are abundant and therefore, explaining the relation
between dependent and many independent variables are so complex, on the other hand some coefficient
values are very little can be removed from the model. Based on the p-value, most of the variables are not
statistically significant. P-value shows that what variable must to be present in the model as a predictor and
what must not to be present. As you can see in the table 1, X4 and X6 are the variables that have the p-
value lower than 0.05 and we must select them as the most effective variables on yield. The predicting
model based regression analysis will be as fallow:
Y= 0.96X4 – 0.78X6
Selection procedures
Backward elimination: in the four steps of the backward elimination four variables such as X4, X3, X2 and
X7 are removed from the model and other variables are remained. Based on this result, four mentioned
variables are the least important variables for predicting yield. By this procedure predicting model are
formulated as follow (Table 2 and3):
Y= -0.19 + 0.98X4 + 0.01X5 – 1.54X6 – 0.004 X8 - 0.01X9 + 0.1X10 + 0.005X11
X4=Spike weight/plant, X5= Grains/spike, X6= Grain weight/spike, X8= Total chlorophyll content of flag leaf,
X9= Biologic yield/plant, X10= Root weight, X11= Leaves area, and Y= Yield.
Forward selection: similar to backward elimination, seven variables are consisted in the forward selection
model but the values of the coefficients have little difference (Table 4).
Y= -0.003 + 0.98X4 - 0.004X5 + 0.01X6 – 0.01 X8 - 1.54X9 + 0.11X10 - 0.003X11
X4=Spike weight/plant, X5= Grains/spike, X6= Grain weight/spike, X8= Total chlorophyll content of flag leaf,
X9= Biologic yield/plant, X10= Root weight, X11= Leaves area, and Y= Yield.
Stepwise selection: Tables 5 shows the data representing entered variables to, or removed variables from
the model of stepwise regression. Similar to the results of backward and forward, stepwise selection can
screen seven variables:
Y= -0.195 + 0.98X4 - 0.01X5 -1.54X6 – 0.004 X8 - 0.01X9 + 0.1X10 - 0.005X11
X4=Spike weight/plant, X5= Grains/spike, X6= Grain weight/spike, X8= Total chlorophyll content of flag leaf,
X9= Biologic yield/plant, X10= Root weight, X11= Leaves area, and Y= Yield.
What model should be the predicting model is of the choice of researcher and he can use best model that
can explain idea of the research but usually stepwise selection is the best. On the other hand, significant t-
test for variables in multivariate regression analysis is not sufficient technique.
Path analysis: for better doing path coefficient analysis and understanding the relationship between yield
and other morphological traits, researcher can use results of the selection procedures in the path analysis,
but here we considered all variables. In this technique, the correlation coefficient between yield and each of
the measured morphological traits is partitioned into direct and their indirect effects via other variables on
yield. Highest direct effect of variables on yield was obtained for spike weight/plant (1.013) while other
variables had a very low direct effect on yield (Table 6). Sum of indirect effects of spike weight/plant were
negative. Except of spike weight/plant, other variables had high indirect effect on grain yield. Spiklets/spike
showed lowest contribution in grain yield through its direct effect but it showed highest contribution through
other traits.
5. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
131
Table 1. The regression coefficient (B), standard error (SE), T-value and probability of the estimated
variables in predicting wheat grain yield by the multiple linear regression analysis under inoculation (In)
and non-inoculation (Non-In) conditions and different water levels
Predictor DF B SE T P
Constant 1 0.5394 0.49180 1.10 0.284
X1 1 -0.1164 0.08245 -1.41 0.171
X2 1 -0.0202 0.05014 -0.40 0.691
X3 1 -0.0082 0.02037 -0.40 0.693
X4 1 0.9617 0.01927 49.90 0.001
X5 1 0.0110 0.00699 1.56 0.131
X6 1 -0.7802 0.34490 -2.26 0.033
X7 1 -0.0070 0.00979 -0.71 0.483
X8 1 -0.0042 0.00318 -1.33 0.196
X9 1 0.0131 0.01165 1.12 0.273
X10 1 0.0840 0.09246 0.91 0.373
X11 1 -0.0008 0.00318 -0.25 0.803
X1= Tiller numbers/plant, X2=Spike length, X3=Spikelets/spike, X4=Spike weight/plant, X5=
Grains/spike, X6= Grain weight/spike, X7= 100-Grain weight, X8= Total chlorophyll content of flag leaf,
X9= Biologic yield/plant, X10= Root weight and X11= Leaves area.
Table 3. Backward elimination and remained variables in the model
Step
Variable
Parameter
estimate
Standard
error
Sum of
squares F-Value Pr > F
Intercept -0.19463 0.08673 0.03923 5.040 0.0329
1 x4 0.97670 0.00947 82.8773 640.1 <.0001
2 x5 0.01208 0.00342 0.09736 12.50 0.0014
3 x6 -1.54441 0.21063 0.41875 53.76 <.0001
4 x8 -0.00407 0.00138 0.06753 8.670 0.0064
5 x9 -0.01094 0.00460 0.04402 5.650 0.0245
6 x10 0.09707 0.04682 0.03347 4.300 0.0475
7 x11 0.00505 0.00160 0.07755 9.960 0.0038
X4=Spike weight/plant, X5= Grains/spike, X6= Grain weight/spike, X8= Total chlorophyll content of
flag leaf, X9= Biologic yield/plant, X10= Root weight and X11= Leaves area.
Table 4. Summary of forward selection
Step
Variable
entered
partial R-
Square
Model R-
Square
Parameter
estimate
Standard
error F-Value Pr > F
1 x4 0.9963 0.9963 0.97859 0.00963 83.85 <.0001
2 x6 0.0013 0.9975 0.01198 0.00341 17.04 0.0002
3 x9 0.0005 0.998 -1.54065 0.21043 7.79 0.0088
4 x5 0.0004 0.9985 -0.00443 0.0043 8.69 0.006
5 x11 0.0002 0.9987 -0.0034 0.00152 5.48 0.0261
6 x8 0.0002 0.9989 -0.01166 0.00465 6.42 0.0169
7 x10 0.0001 0.9991 0.11336 0.04937 4.3 0.0475
Intercept -0.12314 0.11097 0.0034
Table 2. Summary of Backward elimination
Step
Variable
removed
Number of variables
remain in model
Partial
R-Square
Model
R-Square F Value Pr > F
1 x1 10 0 0.9991 0.02 0.8836
2 x3 9 0 0.9991 0.03 0.8558
3 x2 8 0 0.9991 0.28 0.6028
4 x7 7 0 0.9991 1.06 0.3117
X1= Tiller numbers/plant, X2=Spike length, X3=Spikelets/spike, X7= 100-Grain weight,
6. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
132
Table 5. Relative contribution (partial and model R
2
), F-value and probability in predicting wheat grain yield
by the stepwise procedure analysis under non-inoculation condition and different water levels
Step Variable
Entered
Variable
Removed
Partial
R-Square
Model
R-
Square
P-Value
ER
Parameter
Estimate
Standard
Error
P-Value
M
1 x4 - 0.9963 0.9963 <.0001 0.9767 0.00947 <.0001
2 x6 - 0.0013 0.9975 0.0002 -1.54441 0.21063 <.0001
3 x9 - 0.0005 0.998 0.0088 -0.01094 0.00460 0.0245
4 x5 - 0.0004 0.9985 0.0060 0.01208 0.00342 0.0014
5 x11 - 0.0002 0.9987 0.0261 0.00505 0.0016 0.0038
6 x8 - 0.0002 0.9989 0.0169 -0.00407 0.00138 0.0064
7 x10 - 0.0001 0.9991 0.0475 0.09707 0.04682 0.0475
Intercept -0.195 0.0867 0.0329
X4=Spike weight/plant, X5= Grains/spike, X6= Grain weight/spike, X8= Total chlorophyll content of flag
leaf, X9= Biologic yield/plant, X10= Root weight and X11= Leaves area.
R-Square= Coefficient of Determination, P-Value ER= P-value for enter or remove variables and P-Value
M= P-Value for final model.
Principal Component Analysis
Principal component analysis (PCA) is a variable reduction procedure and is useful when you have obtained
data on high number of variables (possibly a large number of variables), and believe that there is some
redundancy in those variables (Fig. 2). Principal component analysis (PCA) can be explained as a method
that reduces data dimensionality by performing a covariance analysis between variables. The main
advantage of principal component analysis is reducing the number of dimensions without much loss of
information. (Everitt and Dunn, 1992). In this case, redundancy means that some of the variables are
correlated with one another, possibly because they are measuring the same construct. PCA uses an
orthogonal transformation to convert a set of observations of possibly well correlated variables into a set of
values of linearly uncorrelated variables called principal components (PC). The number of principal
components is less than or equal to the number of original variables (Dunetman, 1989). This transformation
is defined as such a way that the first PC has the largest possible variance which is accounts for as much of
the variability in the data as possible, and each succeeding component in turn has the highest variance
possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components
(Jackson, 1991). PCs are independent when the data set is jointly normally distributed. The PC may then be
used as predictor or criterion variables in subsequent analyses. PCA is sensitive to the relative scaling of the
original variables and it mostly used as a tool in exploratory data analysis and for making predicting models
(Anderson, 1984). PCA can be done by eigenvalue decomposition of a data covariance (or correlation)
matrix or singular value decomposition of a data matrix, usually after mean centering (and standardizing or
using Z-scores) the data matrix for each attribute. The results of a PCA are usually discussed in terms of
component scores, sometimes called factor scores (the transformed variable values corresponding to a
particular data point), and loadings (the weight by which each standardized original variable should be
multiplied to get the component score). Often, PCA operation can be thought of as revealing the internal
structure of the data in a way which best explains the variance in the data (Jackson, 1991). If a multivariate
dataset is visualized as a set of coordinates in a high-dimensional data space (1 axis per variable), PCA can
supply the user with a lower-dimensional picture. This is done by using only the first few principal
components so that the dimensionality of the transformed data is reduced (Steel and Torrie, 1960).
8. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
134
Example 2 explains how PCA can be used for explaining relationships among dataset related to agriculture.
Example 2: Fourteen strawberry cultivars were cultivated in two consequent years (2009-2010) in the
research center of agriculture and natural recourses of Sanadaj, Iran. Ten variables consist of two set data
(first set morphological traits and second set biochemical traits) were measured (Saed-Moucheshi et al., In
Press).
To regard with considering ten parameters used in this research, ten components were calculated by
PCA. As it is expected, PC1 showed highest eigenvalue (3.51) and though, most variation among data can
be explained by this PC. After component 1, PC2; PC3; and PC4 can explain more variation among data
than other components. Four first components can explain 85% of total variation among data (Table 7). On
the other hand, these components have higher value than unit value (1) of eigenvalue (Fig. 3) and so, these
components were used for explain whole variation among data. Also, From Fig. 3, it can be observed that an
increase in the number of the components was associated with a decrease in eigenvalues, which is an
important indicator in general genetics also efficient indicators for screening the genotypes. Flowering period
had highest coefficient in PC1. In components 2, 3, and 4; yield, anthocyanin and berry size regularly
showed maximum coefficient among traits.
First component clearly separated two groups of variables containing chemical and morphological
parameters. Yield, berry size, berry weight, flowering and fruiting periods had high and negative correlations
with PC1 and though, based upon this component these traits have higher effects in contributing of yield. In
PC2 petiole long and TSS and also yield showed highest and negative correlation with this component. PC2
explain that petiole long had a high effect on yield, on the other hand, higher yield can provide higher amount
of total soluble solids (TSS). Berry size and berry weight and also yield have a very low coefficient in PC3
and based on this component these two traits can be important distributors of yield. Flowering and fruiting
periods and anthocyanin content showed highest negative contribution in PC3 and so, these two periods can
change the anthocyanin content. Titratable acidity (TA) had the highest positive coefficient in PC3 and this
trait is independent variables from others. PC4 also showed that higher yield provide higher TSS content and
direct selecting for yield results in more TSS content as well.
Table 7. Principle component analysis of traits measured during two years strawberry cultivation.
Component
4321traits
-0.174-0.529-0.1910.229Anthocyanin
-0.4820.009-0.285-0.379Berry size
-0.4630.007-0.258-0.395Berry weight
0.326-0.342-0.016-0.424Flowering period
0.454-0.3510.012-0.383Fruiting period
0.348-0.018-0.480.084Petiole long
-0.107-0.404-0.1960.352Stolons/plant
0.2680.559-0.3760.025Titratable acidity
0.017-0.059-0.430.37Total soluble solids
0.0770.002-0.469-0.233Berry yield
1.2511.4302.3303.510Eigenvalue
12.514.323.335.1Proportion percent of variance
85.272.758.435.1Cumulative percent of variance
9. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
135
Factor Analysis
Factor analysis (FA), similar to principal component, is a statistical method used to describe
variability among observed, correlated variables in terms of a potentially lower number of unobserved
variables called factors. The purpose of FA is to discover simple patterns in the pattern of relationships
among the variables (Spearman, 1904). In other words, it is possible, for example, that variations in three or
four observed variables mainly reflect the variations in fewer such unobserved variables. FA searches for
such joint variations in response to unobserved latent variables (Anderson, 1984). The observed variables
are modeled as linear combinations of the potential factors, plus error terms. The information gained about
the interdependencies between observed variables can be used later to reduce the set of variables in a
dataset (Manly, 2001.). Computationally this technique is equivalent to low rank approximation of the matrix
of observed variables. FA is related to principal component analysis (PCA), but the two are not identical.
Latent variable models, including factor analysis, use regression modeling techniques to test hypotheses
producing error terms, while PCA is a descriptive statistical technique (Dunetman, 1989).
FA is used to study the patterns of relationship among many dependent variables, with the goal of
discovering something about the nature of the independent variables that affect them, even though those
independent variables were not measured directly. The different methods of FA at first extract a set of factors
from a data set. These factors are almost always orthogonal and are ordered according to the proportion of
the variance of the original data that these factors explain. In general, only a (small) subset of factors is kept
for further consideration and the remaining factors are considered as either irrelevant or nonexistent (i.e.,
they are assumed to reflect measurement error or noise). In order to make the interpretation of the factors
that are considered relevant, the first selection step is generally followed by a rotation of the factors that were
retained. Two main types of rotation are used: orthogonal; when the new axes are also orthogonal to each
other and oblique; when the new axes are not required being orthogonal to each other. Because the
rotations are always performed in a subspace (the so-called factor space), the new axes will always explain
less variance than the original factors (which are computed to be optimal), but obviously the part of variance
explained by the total subspace after rotation is the same as it was before rotation (only the partition of the
variance has changed; Kaiser, 1958.).
This model proposes that each observed response (measure 1 through measure 5) is influenced
partially by underlying common factors (factor 1 and factor 2) and partially by underlying unique factors (E1
through E5; Fig. 4). The strength of the link between each factor and each measure varies, such that a given
factor influences some measures more than others. FA is performed by examining the pattern of correlations
(or covariances) between the observed measures. Measures that are highly correlated (either positively or
negatively) are likely influenced by the same factors, while those that are relatively uncorrelated are likely
influenced by deferent factors (Manly, 1986).
Fig. 3; Component numbers and their eigenvalues in principal component analysis
10. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
136
Figure 4; Diagram of factor analysis
For factor analysis we use data of example 1. In this data, two first factors of the twelve factors in the factor
analysis accounted for 60.1% of the total variations in the data structure (Table 8). The first factor was
included for yield and spike weight/plant and it could explain 49.5% of the total variation in the dependent
structure for, and its suggested name is yield. The second factor accounted for 10.6% of total variability and
was consisted of total chlorophyll content of the flag leaf which was named total chlorophyll. The first two
factors have higher value than unit value (1) of eigenvalue and are graphically shown in Fig. 5 (a).
In this example, factor analysis showed that spike weight/plant, and total chlorophyll content of the flag leaf
had the highest relative contribution in wheat grain yield. Such results can be recognized by means of
diagram 5 (b).
Table 8. Rotated (Varimax rotation) factor loadings and communalities for the estimated variables of
wheat based on factor analysis technique for inoculation and non-inoculation conditions and different
water levels
Variable Factor1 Factor2 Communality
X1 0.159 0.384 0.543
X2 0.194 0.196 0.390
X3 0.324 0.127 0.451
X4 0.875 0.157 1.032
X5 0.230 0.280 0.510
X6 0.246 0.056 0.302
X7 0.250 0.247 0.497
X8 0.220 0.817 1.037
X9 0.411 0.421 0.832
X10 0.299 0.138 0.437
X11 0.374 0.126 0.500
Y 0.885 0.140 1.025
Latent roots 2.338 1.268 3.606
Factor variance (%) 49.50 10.60 60.10
X1= Tiller numbers/plant, X2=Spike length, X3=Spikelets/spike, X4=Spike weight/plant, X5=
Grains/spike, X6= Grain weight/spike, X7= 100-Grain weight, X8= Total chlorophyll content of flag leaf,
X9= Biologic yield/plant, X10= Root weight, X11= Leaves area and Y= Grain yield.
11. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
137
Figure 5 (a). Scree plot showing eigenvalues in response to the number of factors for the estimated variables
of wheat.
Figure 5 (b). Variables loading by factor analysis and varimax rotation with first two factors.
X1= Tiller numbers/plant, X2=Spike length, X3=Spikelets/spike, X4=Spike weight/plant, X5= Grain
number/spike, X6= Grain weight/spike, X7= 100-Grain weight, X8= Total chlorophyll content of flag leaf, X9=
Biologic yield/plant, X10= Root weight, X11= Leaves area and Y= Grain yield.
Clustering Analysis
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so
that the objects in the same cluster are more similar (in some sense or another) to each other than to those
in other clusters. Clustering is a main task of explorative data mining, and a common technique for statistical
data analysis used in many fields, including machine learning, pattern recognition, image analysis,
information retrieval, and bioinformatics. Cluster analysis itself is not one specific algorithm, but the general
task to be solved (Romesburg, 1984.). It can be achieved by various algorithms that differ significantly in
their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include
groups with low distances among the cluster members, dense areas of the data space, intervals or particular
statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The
appropriate clustering algorithm and parameter settings (including values such as the distance function to
use, a density threshold or the number of expected clusters) depend on the individual data set and intended
use of the results (Richard, 2007). Indeed, Cluster analysis is an exploratory data analysis tool for organizing
observed data into meaningful taxonomies, groups, or clusters, based on combinations of variables, which
maximizes the similarity of cases within each cluster while maximizing the dissimilarity between groups that
12. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
138
are initially unknown. In this sense, cluster analysis creates new groupings without any preconceived notion
of what clusters may arise (Singh, and Chowdhury, 1985). Cluster analysis, like factor analysis, makes no
distinction between dependent and independent variables. The entire set of interdependent relationships is
examined. Cluster analysis is the obverse of factor analysis. Whereas factor analysis reduces the number of
variables by grouping them into a smaller set of factors, cluster analysis reduces the number of observations
or cases by grouping them into a smaller set of clusters (Johnson, and Wicheren, 1996). On the other hand,
Everitt and Dunn (1992) and Nouri et al. (2011) stated that the main advantage of using PCA over cluster
analysis is that each variable can be allocated to one group only.
The first choice that must be made for carrying out the clustering analysis is how similarity (or
alternatively, distance) between data is to be defined. There are many ways to compute how similar series of
data such as Pearson correlation, Spearman rank correlation (for non-numeric data), Euclidean distance and
etc (Romesburg, 1984). After choosing distance method for measuring similarity, related method for
clustering such as hierarchical or non-hierarchical algorithm must be used. Hierarchical method is most
popular method which in this procedure we construct a hierarchy or tree-like structure to see the relationship
among cases. The clusters could be arrived at either from weeding out dissimilar observations (divisive
method) or joining together similar observations (agglomerative method). Most common statistical packages
use agglomerative method and the most popular agglomerative methods are (1) single linkage (nearest
neighbor approach), (2) complete linkage (furthest neighbor), (3) average linkage, (4) Ward’s method, and
(5) Centroid method (Everitt, 1993).
Example 3: Twenty chickpea cultivars were cultivated in 2005 at research center of Razi University,
Kermanshah, Iran, under the rainfed condition (Moucheshi et al., 2009-2010). Yield and its components were
measured and cultivars were grouped using cluster analysis based on the measured traits.
Cluster analysis of chickpea genotypes based on grain yield and its components (Fig. 6) classified
genotypes into four groups with 5, 4, 2 and 9 number of genotypes, respectively. The highest distance or
dissimilarity between genotypes was observed for genotypes 1 and 17, and the highest similarity was
obtained for genotypes 18 and 20. Based on the results, four grouped cluster of cultivars can have a
common origin, on the other hand crossing between genotypes in distanced clusters like first and four cluster
can provided much variation for plant breeding aims.
Figure 6; Results of cluster analysis for 20 chickpea genotypes under rainfed condition
Canonical Correlation
in statistical techniques, dependence refers to any statistical relationship between two random
variables or two sets of data and correlation refers to any of a broad class of statistical relationships involving
dependence; such as dependent phenomena include the correlation between the physical statures of
parents and their offspring, and the correlation between the demand for a product and its price. Formally,
dependence refers to any situation in which random variables do not satisfy a mathematical condition of
probabilistic independence (Steel and Torrie, 1960).
13. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
139
Correlation can refer to any departure of two or more random variables from independence, but
technically it refers to any of several more specialized types of relationship between mean values. There are
several correlation coefficients, often denoted ρ or r, measuring the degree of correlation. The most common
of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two
variables (which may exist even if one is a nonlinear function of the other). Other correlation coefficients
have been developed to be more robust than the Pearson correlation that is, more sensitive to nonlinear
relationships (Johnson and Wicheren, 1996).
In a canonical correlation (multiple multiple correlation), the data can be divided into two sets of
related variables, one referred to independent variables which has two or more Y, for example, variables and
another referred to dependent variables consisting of two or more X, for example, variables where its goal is
to describe the relationships between the two sets of variables. You find the canonical weights (coefficients)
a1, a2, a3, ... ap to be applied to the apX variables and b1, b2, b3, ... bm to be applied to the bmY variables in
such a way that the main correlation is between CVX1 and CVY1 is maximized (Bratlet, 1974.).
CVX1 = a1X1 + a2X2 +...+ apXp
CVY1 = b1Y1 + b2Y2 +...+ bmYm
CVX1 and CVY1 are the first canonical variates, and their correlation is the sample canonical
correlation coefficient for the first pair of canonical variates (Fig. 7). The residuals are then analyzed in the
same fashion to find a second pair of canonical variates, CVX2 and CVY2, whose weights are chosen to
maximizing the correlation between CVX2 and CVY2, using only the variance remaining after the variance due
to the first pair of canonical variates has been removed from the original variables. This continues until a
"significance" cutoff is reached or the maximum number of pairs (which equals the smaller of m and p) has
been found (Giffins, 1985.).
Figure 7; Diagram for canonical correlation
Example 4: nine variables consist of tow set (yield and its component, and photosynthesis related
traits) were measured in 20 chickpea genotypes under rainfed condition at Razi University, Kermanshah,
Iran in 2005. We want to consider the relationship between these two sets of variables (unpublished data).
Number of roots (Eigenvalue or squared canonical correlation) is equal to the number of variables in
the smaller set of data therefore; the number of roots in this example is 4 (Fig 8). In this example none of the
canonical correlation between sets of the variables is significant and so, there is no relationship between
these two sets (Table 9). For better understanding of this correlation we assume that the first canonical
correlation (0.428) is significant. Yield has the highest and negative contribution in the first root among the
first set of the data while 100seed weight has a high positive contribution. Highest negative contribution
among second set of the data was belonged to chlorophyll florescence and positive one was observed in the
chlorophyll b. these results shows that yield and chlorophyll floresenc have a direct relationship, and
chlorophyll a and also number of pods per plant are rather contributed in this relationship. These variables
are negatively correlated with first root. On the other hand, 100seeds weight, seed weight, number of seed
per plant, chlorophyll b, and chlorophyll ab have straight relationship on another but contribution of SW and
NSP in the first set and Ch ab in the second set are low. These variables are positively correlated with first
root.
Redundancy index is the amount of variance in a canonical variate (dependent or independent)
explained by the other canonical variate in the canonical function. It can be computed for both the dependent
14. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
140
and the independent canonical variates in each canonical function. The explained variability of each set of
the data by another one in this example are very low (3% for first set and 6.38% for the second set).
Table 9. summary of canonical correlation
Root1 Root2 Root3 Root4 variance
Extracted
Total
redundancy
100SW 2.694 -0.746 1.906 0.89
SW 0.588 0.1 -1.716 0.245
NSP 0.677 0.907 0.107 -0.06
NPP -0.231 -0.83 -0.231 -0.954
Y -2.897 1.042 0.038 -1.069
70.22%
3.00%
Ch a -0.288 0.999 -0.118 0.147
Ch b 0.529 0.51 -0.076 -0.836
Ch ab 0.402 -0.375 -0.778 0.52
Ch f -0.903 -0.202 -0.452 -0.253
100%
6.83%
EigenValue 0.1835 0.0805 0.0525 0.001
Can Corr 0 .428 0.284 0.229 0.032
P-value 0.55817 >0.56 >0.56 >0.56
100SW: 100 seed weight; SW: seed weight per plant; NSP: number of seed per plant; NPP:
number of pod per plant; Y: yield; CH a: chlorophyll a; Ch b: Chlorophyll b; Ch ab: total
chlorophyll content; and Ch f: chlorophyll florescence.
Figure 8. Plot of Eigenvalues or root number and their contribution in the canonical correlation
It seems that research works in agriculture and plant science are a little weak in the statistical
discussion and explaining. This review explained most widely applied multivariate statistical methods that
researchers in agriculture and plant science can use in their investigations to give more authority to their
works and results.
15. Intl. J. Agron. Plant. Prod. Vol., 4 (1), 127-141, 2013
141
References
Anderson TW, 1984. An introduction to multivariate statistical analysis. John Wiley, New York.
Bratlet MS, 1974. The general canonical correlation distribution. Annals of Mathematical Statistics 18 1-17.
Burnham KP, Anderson DR. 2002. Model selection and multimodel inference. Springer, New York.
Dong B, Liu M, Shao HB, Li Q, Shi L, Du F, Zhang Z, 2008. Investigation on the relationship between leaf
water use efficiency and physio-biochemical traits of winter wheat under rained condition. Colloids and
Surfaces B: Biointerfaces 62: 280-287.
Draper NR, Smith H, 1966. Applied Regression Analysis. John Wiley, New York.
Draper NR, Smith H, 1981. Applied regression analysis. John Wiley, New York.
Dunetman GH, 1989. Principal component analysis. Sage Publication, Newbury Park.
Everitt BS, 1993. Cluster Analysis. Wiley, New York.
Everitt BS, Dunn G, 1992. Applied Multivariate Data Analysis, Oxford University Press, New York, NY.
Giffins R, 1985. Canonical analysis: a review with application in ecology. Springer-Verlag, Berlin.
Harrell FE, 2001. Regression modeling strategies: With applications to linear models, logistic regression, and
survival analysis. Springer-Verlag New York.
Jackson JE, 1991. A user's guide to principal component. John Wiley New York.
Johnson RA, Wicheren, DW, 1996. Applied multivariate statistical analysis. Prentice Hall of India, New Delhi.
Kaiser HF, 958. The varimax criterion for analytic notation in factor analysis. Psychometricka 23 187-200.
Kleinbaum DG, Kupper LL, Muller KE, 1988. Applied Regression Analysis and Other Multivariable Methods.
PWS-Kent Publishing Co, Boston.
Kleinbaum DG, Kupper LL, Muller KE, 1988. Applied Regression Analysis and Other Multivariable Methods.
PWS-Kent Publishing Co, Boston.
Manly BFJ, 1986. Multivariate statistical method; a primier. Chapman and Hall, London - New York.
Manly BFJ, 2001. Statitics for environmental science and management. Chapman and hall/CRC, Boca
Raton.
Miller AJ, 2002. Subset selection in regression. Chapman and Hall London.
Moucheshi AS, Heidari B, Dadkhodaie A, 2009-2010. Genetic Variation and Agronomic Evaluation of
Chickpea Cultivars for Grain Yield and Its Components Under Irrigated and Rainfed Growing
Conditions. Iran Agricultural Research 28-29: 39-50.
Mouchesi A, heidari B, Assad MT. 2012. Alleviation of drought stress effects on wheat using arbuscular
mycorrhizal symbiosis. International Journal of AgriScience 2: 35-47.
Nouri A, Etminan A, Dasilva D, Mohammad R, 2011, Assessment of yield, yield-related traits and drought
tolerance of durum wheat genotypes (Triticum turjidum var. durum Desf.). Australian Journal of Crop
Science 5:8-16.
Richard AJ, 2007. Applied multivariate statistical analysis. Prentice Hall
Romesburg HC, 1984. Cluster analysis for researches. Lifetime Learning Publication, Belmont
Saed-Moucheshi A, Karami F, Nadafi S, Khan AA, (in press) Heritability, genetic variability and
interrelationship among some morphological and chemical parameters of strawberry cultivars Pakistan
Journal of Botany
Shipley B, 1997. Exploratory path analysis with applications in ecology and evolution. The American
Naturalist 149: 1113-1138.
Singh RK, Chowdhury BD, 1985. Biometrical method in quantitative genetic analysis. Kalyani publishers.
Ludhiana, , New Delhi.
Spearman C, 1904. General intelligence, objectively determined and measured. American Journal of
Psychology 15 201-293.
Steel RGD, Torrie JH, 1960. Principles and Procedures of Statistics. McGraw Hill Book Co. Inc., New York.