Here are the steps to find the variance and standard deviation of the given sample data:
1) Find the mean (x-bar) of the data:
(5 + 17 + 12) / 3 = 34 / 3 = 11.33
2) Find the deviations from the mean:
5 - 11.33 = -6.33
17 - 11.33 = 5.67
12 - 11.33 = 0.67
3) Square the deviations:
(-6.33)^2 = 40.11
(5.67)^2 = 32.17
(0.67)^2 = 0.45
4) Sum the squared deviations:
40.11
Data Presentation & Analysis Meaning, Stages of data analysis, Quantitative & Qualitative data analysis methods, Descriptive & inferential methods of data analysis
Data Presentation & Analysis Meaning, Stages of data analysis, Quantitative & Qualitative data analysis methods, Descriptive & inferential methods of data analysis
Research design decisions and be competent in the process of reliable data co...Stats Statswork
Research Design may be described as the researchers scheme of outlining the flow of his project. It is based on research design, that the researcher goes about gathering data to answer his research question. It enables the researcher to prioritize his work, create better questionnaires and arrive at conclusions with greater clarity. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following – Always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Learn More: http://bit.ly/2S312hb
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics Across Methodologies | Wide Range Of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com/
Email: info@statswork.com
UnitedKingdom: +44-1143520021
India: +91-4448137070
WhatsApp: +91-8754446690
data science course with placement in hyderabadmaneesha2312
360DigiTMG delivers data science course with placement in hyderabad, where you can gain practical experience in key methods and tools through real-world projects. Study under skilled trainers and transform into a skilled Data Scientist. Enroll today!
PUH 6301, Public Health Research 1 Course Learning OuTatianaMajor22
PUH 6301, Public Health Research 1
Course Learning Outcomes for Unit VI
Upon completion of this unit, students should be able to:
4. Evaluate strategies for data analysis to determine the best statistical tests needed for research
methods.
4.1 Determine the four levels of measurement as valid research statistical techniques in the public
health research process.
4.2 Explain why proper data and statistical analysis is important.
4.3 Describe the basic types of statistic tests.
Course/Unit
Learning Outcomes
Learning Activity
4.1
Unit Lesson
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Chapter 33
Blog: “Descriptive vs. Inferential Statistics: What’s the Difference?
Unit VI Essay
4.2
Unit Lesson
Chapter 28
Unit VI Essay
4.3
Unit Lesson
Chapter 29
Unit VI Essay
Required Unit Resources
Chapter 28: Data Management
Chapter 29: Descriptive Statistics
Chapter 30: Comparative Statistics
Chapter 31: Regression Analysis
Chapter 33: Additional Analysis Tools
In order to access the following resource, click the link below:
The website below provides a good summary of how the public health researcher can use descriptive and
inferential statistics methods to conduct public health research.
Market Research Guy. (2011, December 1). Descriptive vs. inferential statistics: What’s the difference? [Blog
post]. http://www.mymarketresearchmethods.com/descriptive-inferential-statistics-difference/
UNIT VI STUDY GUIDE
Data Analysis Plan
http://www.mymarketresearchmethods.com/descriptive-inferential-statistics-difference/
PUH 6301, Public Health Research 2
UNIT x STUDY GUIDE
Title
Unit Lesson
Introduction
This unit covers the statistical procedures used to analyze the data collected from research tools. During this
stage of research, you may begin to draw conclusions and be able to answer the research question(s) and
sub-question(s) you developed in Unit I. Use statistics in this stage of research to manipulate the data and
make it understandable for others to read. Shi (2008) encourages researchers to know and understand basic
statistics and statistical procedures. The data analysis phase of research is important because it makes sense
of the data that can be used for future research studies (Jacobsen, 2021).
Data Management
Data management is the entire process of keeping a record of all the results of clinical assessments
conducted during a research study (Jacobsen, 2021). Record keeping includes listing details on potential
articles, pulling information from patient charts, tracking responses from surveys, or recording assessment
results from cohorts or studies. It is vital that those responsible for collecting and keeping data maintain
confidentiality and the integrity of data sets from all outside sources. Once researchers enter the data into the
spreadsheet or database, the data should be recoded and double-checked prior to beginning statistical
ana ...
Statistical Processes
Can descriptive statistical processes be used in determining relationships, differences, or effects in your research question and testable null hypothesis? Why or why not? Also, address the value of descriptive statistics for the forensic psychology research problem that you have identified for your course project. read an article for additional information on descriptive statistics and pictorial data presentations.
300 words APA rules for attributing sources.
Computing Descriptive Statistics
Computing Descriptive Statistics: “Ever Wonder What Secrets They Hold?” The Mean, Mode, Median, Variability, and Standard Deviation
Introduction
Before gaining an appreciation for the value of descriptive statistics in behavioral science environments, one must first become familiar with the type of measurement data these statistical processes use. Knowing the types of measurement data will aid the decision maker in making sure that the chosen statistical method will, indeed, produce the results needed and expected. Using the wrong type of measurement data with a selected statistic tool will result in erroneous results, errors, and ineffective decision making.
Measurement, or numerical, data is divided into four types: nominal, ordinal, interval, and ratio. The businessperson, because of administering questionnaires, taking polls, conducting surveys, administering tests, and counting events, products, and a host of other numerical data instrumentations, garners all the numerical values associated with these four types.
Nominal Data
Nominal data is the simplest of all four forms of numerical data. The mathematical values are assigned to that which is being assessed simply by arbitrarily assigning numerical values to a characteristic, event, occasion, or phenomenon. For example, a human resources (HR) manager wishes to determine the differences in leadership styles between managers who are at different geographical regions. To compute the differences, the HR manager might assign the following values: 1 = West, 2 = Midwest, 3 = North, and so on. The numerical values are not descriptive of anything other than the location and are not indicative of quantity.
Ordinal Data
In terms of ordinal data, the variables contained within the measurement instrument are ranked in order of importance. For example, a product-marketing specialist might be interested in how a consumer group would respond to a new product. To garner the information, the questionnaire administered to a group of consumers would include questions scaled as follows: 1 = Not Likely, 2 = Somewhat Likely, 3 = Likely, 4 = More Than Likely, and 5 = Most Likely. This creates a scale rank order from Not Likely to Most Likely with respect to acceptance of the new consumer product.
Interval Data
Oftentimes, in addition to being ordered, the differences (or intervals) between two adjacent measurement values on a measurement scale are identical. For example, the di ...
Data science is likely to become even more important as the volume and complexity of data continues to increase. With advancements in machine learning and artificial intelligence, data scientists will have access to more sophisticated tools and algorithms to analyze and extract insights from data. Data science will continue to play a crucial role in fields such as healthcare, finance, and technology, helping organizations make better decisions and drive innovation. Additionally, there will be a greater emphasis on data privacy and ethical considerations as the use of data becomes more prevalent.
#Data science is a field that involves using statistical and computational methods to analyze and extract insights from data. It plays a crucial role in various industries, from business and healthcare to finance and technology.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Research design decisions and be competent in the process of reliable data co...Stats Statswork
Research Design may be described as the researchers scheme of outlining the flow of his project. It is based on research design, that the researcher goes about gathering data to answer his research question. It enables the researcher to prioritize his work, create better questionnaires and arrive at conclusions with greater clarity. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following – Always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Learn More: http://bit.ly/2S312hb
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics Across Methodologies | Wide Range Of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com/
Email: info@statswork.com
UnitedKingdom: +44-1143520021
India: +91-4448137070
WhatsApp: +91-8754446690
data science course with placement in hyderabadmaneesha2312
360DigiTMG delivers data science course with placement in hyderabad, where you can gain practical experience in key methods and tools through real-world projects. Study under skilled trainers and transform into a skilled Data Scientist. Enroll today!
PUH 6301, Public Health Research 1 Course Learning OuTatianaMajor22
PUH 6301, Public Health Research 1
Course Learning Outcomes for Unit VI
Upon completion of this unit, students should be able to:
4. Evaluate strategies for data analysis to determine the best statistical tests needed for research
methods.
4.1 Determine the four levels of measurement as valid research statistical techniques in the public
health research process.
4.2 Explain why proper data and statistical analysis is important.
4.3 Describe the basic types of statistic tests.
Course/Unit
Learning Outcomes
Learning Activity
4.1
Unit Lesson
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Chapter 33
Blog: “Descriptive vs. Inferential Statistics: What’s the Difference?
Unit VI Essay
4.2
Unit Lesson
Chapter 28
Unit VI Essay
4.3
Unit Lesson
Chapter 29
Unit VI Essay
Required Unit Resources
Chapter 28: Data Management
Chapter 29: Descriptive Statistics
Chapter 30: Comparative Statistics
Chapter 31: Regression Analysis
Chapter 33: Additional Analysis Tools
In order to access the following resource, click the link below:
The website below provides a good summary of how the public health researcher can use descriptive and
inferential statistics methods to conduct public health research.
Market Research Guy. (2011, December 1). Descriptive vs. inferential statistics: What’s the difference? [Blog
post]. http://www.mymarketresearchmethods.com/descriptive-inferential-statistics-difference/
UNIT VI STUDY GUIDE
Data Analysis Plan
http://www.mymarketresearchmethods.com/descriptive-inferential-statistics-difference/
PUH 6301, Public Health Research 2
UNIT x STUDY GUIDE
Title
Unit Lesson
Introduction
This unit covers the statistical procedures used to analyze the data collected from research tools. During this
stage of research, you may begin to draw conclusions and be able to answer the research question(s) and
sub-question(s) you developed in Unit I. Use statistics in this stage of research to manipulate the data and
make it understandable for others to read. Shi (2008) encourages researchers to know and understand basic
statistics and statistical procedures. The data analysis phase of research is important because it makes sense
of the data that can be used for future research studies (Jacobsen, 2021).
Data Management
Data management is the entire process of keeping a record of all the results of clinical assessments
conducted during a research study (Jacobsen, 2021). Record keeping includes listing details on potential
articles, pulling information from patient charts, tracking responses from surveys, or recording assessment
results from cohorts or studies. It is vital that those responsible for collecting and keeping data maintain
confidentiality and the integrity of data sets from all outside sources. Once researchers enter the data into the
spreadsheet or database, the data should be recoded and double-checked prior to beginning statistical
ana ...
Statistical Processes
Can descriptive statistical processes be used in determining relationships, differences, or effects in your research question and testable null hypothesis? Why or why not? Also, address the value of descriptive statistics for the forensic psychology research problem that you have identified for your course project. read an article for additional information on descriptive statistics and pictorial data presentations.
300 words APA rules for attributing sources.
Computing Descriptive Statistics
Computing Descriptive Statistics: “Ever Wonder What Secrets They Hold?” The Mean, Mode, Median, Variability, and Standard Deviation
Introduction
Before gaining an appreciation for the value of descriptive statistics in behavioral science environments, one must first become familiar with the type of measurement data these statistical processes use. Knowing the types of measurement data will aid the decision maker in making sure that the chosen statistical method will, indeed, produce the results needed and expected. Using the wrong type of measurement data with a selected statistic tool will result in erroneous results, errors, and ineffective decision making.
Measurement, or numerical, data is divided into four types: nominal, ordinal, interval, and ratio. The businessperson, because of administering questionnaires, taking polls, conducting surveys, administering tests, and counting events, products, and a host of other numerical data instrumentations, garners all the numerical values associated with these four types.
Nominal Data
Nominal data is the simplest of all four forms of numerical data. The mathematical values are assigned to that which is being assessed simply by arbitrarily assigning numerical values to a characteristic, event, occasion, or phenomenon. For example, a human resources (HR) manager wishes to determine the differences in leadership styles between managers who are at different geographical regions. To compute the differences, the HR manager might assign the following values: 1 = West, 2 = Midwest, 3 = North, and so on. The numerical values are not descriptive of anything other than the location and are not indicative of quantity.
Ordinal Data
In terms of ordinal data, the variables contained within the measurement instrument are ranked in order of importance. For example, a product-marketing specialist might be interested in how a consumer group would respond to a new product. To garner the information, the questionnaire administered to a group of consumers would include questions scaled as follows: 1 = Not Likely, 2 = Somewhat Likely, 3 = Likely, 4 = More Than Likely, and 5 = Most Likely. This creates a scale rank order from Not Likely to Most Likely with respect to acceptance of the new consumer product.
Interval Data
Oftentimes, in addition to being ordered, the differences (or intervals) between two adjacent measurement values on a measurement scale are identical. For example, the di ...
Data science is likely to become even more important as the volume and complexity of data continues to increase. With advancements in machine learning and artificial intelligence, data scientists will have access to more sophisticated tools and algorithms to analyze and extract insights from data. Data science will continue to play a crucial role in fields such as healthcare, finance, and technology, helping organizations make better decisions and drive innovation. Additionally, there will be a greater emphasis on data privacy and ethical considerations as the use of data becomes more prevalent.
#Data science is a field that involves using statistical and computational methods to analyze and extract insights from data. It plays a crucial role in various industries, from business and healthcare to finance and technology.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
1. ARBA MINCH TECHNOLOGY INSTITUTE
DEPARTMENT OF MECHANICAL ENGINEERING
Chapter Four: Processing and Data Analysis
Instructor: Solomon N.(Ph.D.)
Academic Year: 2022/23
1
2. Processing and Data Analysis
2
After collecting data, the method of converting raw
data into meaningful statement; includes data
processing, data analysis, and data interpretation and
presentation.
Acquiring data: Acquisition involves collecting or
adding to the data holdings. There are several methods
of acquiring data:
collecting new data
using your own previously collected data
reusing someone others data
purchasing data
acquired from Internet (texts, social media, photos)
3. Processing and Data Analysis
3
Data processing: A series of actions or steps performed
on data to verify, organize, transform, integrate, and
extract data in an appropriate output form for
subsequent use.
Methods of processing must be rigorously
documented to ensure the utility and integrity of the
data.
Data Analysis involves actions and methods performed
on data that help describe facts, detect patterns,
develop explanations and test hypotheses. This
includes data quality assurance, statistical data analysis,
modeling, and interpretation of results.
4. Data Preservation and Re-use
4
Data preservation involves actions and procedures to keep
data for future use, and includes data archiving and/or data
submission to a data repository. Data preservation needs data
description, documentation and metadata.
The goal of all these actions is to make data findable,
comprehensible and easy to use. It also involves long-term
preservation and curation of data.
Documentation provides an overview of the research context
and design, data collection methods, data preparation and
results or findings and is key to enabling the secondary user
to make informed use of the data.
Metadata are providing standardized structured information
explaining the purpose, origin, time references, geographic
location, creator, access conditions and terms of use of data.
5. Data Processing
5
Data processing is concerned with editing, coding,
classifying, tabulating and charting and diagramming
research data. The essence of data processing in research
is data reduction/saving.
Data reduction involves winnowing/inspecting/ out the
irrelevant from the relevant data and establishing order
from disorder and giving shape to a mass of data
DOI: digital object identifier is a unique and persistent
identifier makes data easy to find and cite data sets.
Example: doi.org/10.1016/j.ecolind.2015.04.011
Data re-use means data mining, replication research,
comparative studies, longitudinal research etc. E.g. data
collected for one research objective can be used in a new
study dealing with some other similar problem.
6. Data Processing
6
Six stages of data processing
1. Data collection: Collecting data is the first step in data
processing. Data is pulled from available sources,
including data lakes and data warehouses.
2. Data preparation: Once the data is collected, it then
enters the data preparation stage. Data preparation,
often referred to as “pre-processing” is the stage at
which raw data is cleaned up and organized for the
following stage of data processing.
Data collection
Data preparation
Data input/interpretation
Data Processing
Data output/interpretation
Data storage and Report
Writing
7. Data Processing
7
3. Data input: The clean data is then entered into its
destination and translated into a language that it can
understand. Data input is the first stage in which raw
data begins to take the form of usable information.
4. Processing: During this stage, the data inputted to
the computer in the previous stage is actually
processed for interpretation. Processing is done using
machine learning algorithms, though the process
itself may vary slightly depending on the source of
data being processed.
8. Data Processing
8
6. Data storage and Report Writing: The final stage of
data processing is storage. After all of the data is
processed, it is then stored for future use. While some
information may be put to use immediately, much of it
will serve a purpose later on.
5.Data output/interpretation:
The output/interpretation stage is the stage at which
data is finally usable to non-data scientists. It is
translated, readable, and often in the form of graphs,
videos, images, plain text, etc.).
9. STATISTICS IN RESEARCH
9
The role of statistics in research is to function as a tool in
designing research, analyzing its data and drawing conclusions.
Most research studies result in a large volume of raw data which
must be suitably reduced so that the same can be read easily and
can be used for further analysis. There are two major areas of
statistics
Descriptive statistics and Inferential statistics.
10. Cont.
10
Descriptive statistics concern the development of certain
indices/directions from the raw data,
Inferential statistics concern with the process of
generalization. Inferential statistics are also known as
sampling statistics and are mainly concerned with two
major type of problems:
“Descriptive” describes data, while “inferential” infers or allows
the researcher to arrive at a conclusion based on the collected
information.
the estimation of population parameters, and
the testing of statistical hypotheses.
12. Cont.
12
For example, you are tasked to research about teenage
pregnancy in a certain high school. Using both descriptive
and inferential statistics, you will be researching the number
of teenage pregnancy cases in the school for a specific
number of years. The difference is that with descriptive
statistics, you are merely summarizing the collected data
and, if possible, detecting a pattern in the changes.
For example, it can be said that for the past five years, the
majority of teenage pregnancies in X High School happened
to those enrolled in the third year. There’s no need to
predict that on the sixth year, the third year students would
still be the ones with a greater number of teenage
pregnancies. Conclusions as well as predictions are only
done in inferential statistics.
13. STATISTICS IN RESEARCH
13
The important statistical measures that are used to
summarize the survey/research data are:
measures of central tendency or statistical
averages;
measures of dispersion;
measures of asymmetry (skewness);
measures of relationship; and
other measures.
14. STATISTICS IN RESEARCH
14
Measures of Central Tendency
Amongst the measures of central tendency, the three
most important ones are the arithmetic average or
mean, median and mode.
A measure of central tendency is a single value that
attempts to describe a set of data by identifying the
central position within that set of data. As such,
measures of central tendency are sometimes called
measures of central location.
The mean, median and mode are all valid measures of
central tendency, but under different conditions, some
measures of central tendency become more appropriate
to use than others.
15. Cont.
15
Mean (Arithmetic)
The mean (or average) is the most popular and well
known measure of central tendency.
It can be used with both discrete and continuous
data, although its use is most often with continuous
data.
The mean is equal to the sum of all the values in
the data set divided by the number of values in the
data set.
16. Cont.
16
The mean salary for these ten staff is $30.7k. However,
inspecting the raw data suggests that this mean value
might not be the best way to accurately reflect the
typical salary of a worker, as most workers have salaries
in the $12k to 18k range. The mean is being skewed
/tilted by the two large salaries. Therefore, in this
situation, we would like to have a better measure of
central tendency.
For example, consider the wages of staff at a factory below:
17. Cont.
17
Median
The median is the middle score for a set of data that has
been arranged in order of magnitude. The median is less
affected by outliers and skewed data. In order to calculate
the median, suppose we have the data below:
in this case, 56. It is the middle mark because there are 5
scores before it and 5 scores after it. This works fine when
you have an odd number of scores, but what happens
when you have an even number of scores? What if you had
only 10 scores? Well, you simply have to take the middle
two scores and average the result.
18. Cont.
18
Mode
The mode is the most frequent score in our data set. On a histogram it
represents the highest bar in a bar chart or histogram. You can,
therefore, sometimes consider the mode as being the most popular
option. An example of a mode is presented below:
19. Measure of variation
19
Example
Consider the following two sets of scores: Set 1: 40, 50, 60, 60, 40, 50
Set 2: 0,100, 25, 75, 80, 20
Alert block
Both these sets have the same mean (50),
But the second set is a lot more widely dispersed ("scattered") than
the first.
20. Measure of variation/dispersion
20
The scatter or spread of items of a distribution is
known as dispersion or variation.
In other words the degree to which numerical data
tend to spread about an average value is called
dispersion or variation of the data.
Measures of dispersion are statistical measures which
provide ways of measuring the extent in which data
are dispersed or spread out.
21. Objective of Measuring Variation
21
To determine the reliability of an average by pointing
out as how far an average is representative of the
entire data.
To determine the nature and cause of variation in
order to control the variation itself.
Enable comparison of two or more distribution with
regard to their variability.
Measuring variability is of great importance to other
statistical analysis. E.g., it is the basis of statistical
quality control
22. A good measure of variation
22
It should be easy to compute and understand.
It should be based on all observations.
It should be Uniquely defined
It should be capable of further statistical treatment.
It should be as little as affected by extreme values
23. Types of measure of variation
23
Absolute measure: The measures of dispersion which are
expressed in terms of original units of a data termed as absolute
measures. :
Range
Quartile deviation
Mean deviation
Variance
Standard deviation
Relative measures: are known as coefficients of dispersion, are
obtained as ratios or percentages.
Relative range
Coefficient of quartile deviation
Coefficient of mean deviation
Coefficient of variation
Standard scores
24. The range
24
Several measures of dispersion are available. We will
discuss the common ones below.
The Range:
The difference between the largest (maximum) and
smallest (minimum) values.
Range = Maximum – Minimum (1)
For frequency distributed data, the range is:
The difference between the upper class boundary of
the last class and the lower class boundary of the first
class.
25. Measure of Dispersion
25
Measure of variation-dispersion
Find the Range of 54.5, 55.0, 55.7, 51.8, 54.2, 52.4
Solution:
range(R) = 55.7- 51.8 = 3.9cm
Solution: Range = UCBl - LCBf = 118.5-52.5 = 66
26. Measure of Dispersion
26
Quartile deviation (QD):
QD is the product of half of the difference between
the upper and lower quartiles. The range expresses
the extreme variability of observations of a variable. is
half of the inter quartile range.
Coefficient of quartile deviation (CQD):
It gives the average amount by which the two quartiles differ
from the median
27. Measure of Dispersion
27
Mean Deviation(M.D):
The average deviation measures the scatter of the
individual observations around a central value usually
the mean or the median of a distribution.
The mean deviation is defined as the arithmetic mean
of positive deviations of each observation from either
the mean or the median of a distribution.
If the deviations are taken from the mean then it is
called mean deviation about the mean.
On the other hand, if the deviations are taken from
the median we call it mean deviation about the
median.
28. Mean deviation
28
The mean Deviation (M.D) is the arithmetic mean of the absolute
deviations of the values from the mean.
It is the “average absolute deviation of the values from the mean”.
Note that: while dealing with population values, it is adjusted
accordingly
Mean Deviations for Grouped data (discrete or continuous)
Where m = number of classes and xi = class mark of the ith class; n =
number of observation
29. Mean deviation
29
Mean deviation about the median ( MD)
ungrouped data:
grouped Frequency Distribution:
30. Example
30
The weights of a sample of six students from a class (in kilograms) is
measured as: 53, 56, 57, 59, 63 and 66. Find the mean deviation about
the mean and the mean deviation from the median.
solution: First find the mean and the median. The mean is 59 kg and
the median is 58 kg. Then take the deviations of each observation
from these averages as shown below
32. Solution
32
Mean of each class = lower class point + upper class point divided by
two. = (1+5)/2= 3
Mean= 100/10=10
MD from the mean = =60/10=6
Class xi fi fixi |xi-ẋ| fi|xi-ẋ|
1-5 3 4 12 |3-10|=7 4*7=28
6-10 8 1 8 2 1*2=2
11-15 13 2 26 3 2*3=6
16-20 18 3 54 8 3*8=24
𝑓𝑖 = 10
100
𝑓𝑖 = 60
33. Solution
33
MD from the median = =60/10=6
Class xi fi fixi |xi-ẋ| fi|xi-ẋ|
1-5 3 4 12 |3-10.5|=7 4*7.5=28
6-10 8 1 8 2.5 1*2.5=2.5
11-15 13 2 26 2.5 2*2.5=5
16-20 18 3 54 7.5 3*7.5=24
𝑓𝑖 = 10
100
𝑓𝑖 = 60
Median = 3, 3,3, 3, 8,,13,13, 18,18,18
median= (8+13)/2=10.5
34. Coefficients of Mean Deviation(C.M.D)
34
Example: Find the coefficient of mean deviation about the mean and
mean deviation about the median for the weights of six students in
example above.
Solution: Coefficient of mean deviation about the mean
35. Variance and Standard Deviation
35
The variance and standard deviation are the most superior and
widely used measures of dispersion
Both measures the average dispersion of the observations
around the mean.
The variance is defined as the average of the squared deviation
from the mean.
variance is a measure of dispersion that takes into account the
spread of all data points in a data set. It’s the measure of dispersion
the most often used, along with the standard deviation, which is
simply the square root of the variance.
The variance is mean squared difference between each data point
and the center of the distribution measured by the mean.
An item selected at random from a data set whose standard
deviation is low has a better chance of being close to the mean
than an item from a data set whose standard deviation is higher.
40. Variance and standard deviation formula
40
Quiz-1
Find the variance and standard deviation of the following sample data
i. 5, 17, 12, 10,8
ii .The data is given in the form of frequency distribution.
41. Coefficient of Variance
41
The coefficient of variation (CV) is the ratio of the
standard deviation to the mean. The higher the coefficient
of variation, the greater the level of dispersion around the
mean. It is generally expressed as a percentage. Without
units, it allows for comparison between distributions of
values whose scales of measurement are not comparable.
When we are presented with estimated values, the CV
relates the standard deviation of the estimate to the value
of this estimate. The lower the value of the coefficient of
variation, the more precise the estimate.
42. Coefficient of Variance formula
42
In situations where either two series have different units of
measurements, or their means differ sufficiently in size, the CV
should be used as a measure of dispersion.
In spite of the fact that the C.V. is broadly applied, its
disadvantage is that it’s not useful when the mean is negative or
zero or very close to zero.
Interpretation of the coefficient of variation: the distribution
having less CV is said to be less variable or more consistent
43. Why We Need the Coefficient of Variation
43
So, standard deviation is the most common measure of
variability for a single data set. But why do we need yet
another measure such as the coefficient of variation? Well,
comparing the standard deviations of two different data
sets is meaningless, but comparing coefficients of
variation is not.
Example question: Two versions of a test are given to
students. One test has pre-set answers and a second test
has randomized answers. Find the coefficient of variation.
Regular Test
Regular Test
Randomized
Answers
Mean 50.1 45.8
SD 11.2 12.9
44. Cont.
44
Solution
Step 1: Divide the standard deviation by the mean for the
first sample:
11.2 / 50.1 = 0.22355
Step 2: Multiply Step 1 by 100: 0.22355 * 100 =22.355%
Step 3: Divide the standard deviation by the mean for the
second sample: 12.9 / 45.8 = 0.28166
Step 4: Multiply Step 3 by 100: 0.28166 * 100 =28.266%
That’s it! Now you can compare the two results directly.
47. Cont.
47
Example: Suppose that the mean weight of a group
of students is 165 pounds with a S.D of 8 pounds. If
the height of the same group of students has a
mean of 60 inches with a S.D of 3 inches, compare
the variability in weight and height measurements.
Solution:
48. Standard Scores (Z-Scores)
48
A Z-Score is a statistical measurement of a score's
relationship to the mean in a group of scores.
Are not measures of relative dispersion, but one of the
applications of standard deviation.
We define the standard score as:.
Tells us how many standard deviations a value lies
above (if positive) or below (if negative) the mean.
Standard score gives the deviations from the mean in
units of standard deviation It is used to compare two
observations coming from different groups.
49. Standard Scores (Z-Scores)
49
Questions: Two third year Medical laboratory sections were given
introduction to biostatistics examinations. The following information
was given.
Student A from section1 scored 90 and student B from section 2
scored 95.
Relatively speaking who performed better ?
Student A performed better relative to his section because the score of student A
is two standard deviation above the mean score of his section while, the score of
student B is only one standard deviation above the mean score of his section
50. Standard Scores (Z-Scores)
50
Quiz 3 : Given mean and standard deviation is 50 and 10, what value
of x has a z-score of 1.4? What is the z-score that correspondents to x
= 30?
51. Moments
51
The rth moment about the mean (the rth central moment) defined as :
for continuous grouped data it is given by:
Example: Find the first three central moments of the numbers 2, 3 and 7
Solution first find the mean:
52. 52
Normal Distribution, Skewness and Kurtosis
A normal distribution is the proper term for a probability
bell curve.
In a normal distribution the mean is zero and the
standard deviation is 1. It has zero skew and a kurtosis
of 3. Normal distributions are symmetrical, but not all
symmetrical distributions are normal
What are the 4 characteristics of a normal distribution?
Normal distributions are symmetric, unimodal, and
asymptotic, and the mean, median, and mode are all
equal.
A normal distribution is perfectly symmetrical around its
center. That is, the right side of the center is a mirror
image of the left side
53. 53
Skewness
Skewness is the degree of asymmetry or departure from
symmetry of a distribution.
A skewed frequency distribution is one that is not
symmetrical.
Skewness is concerned with the shape of the curve not size
If the frequency curve (smoothed frequency polygon) of a
distribution has a longer tail to the right of the central
maximum than to the left, the distribution is said to be
skewed to the right or said to have positive skewness. If it
has a longer tail to the left of the central maximum than to
the right, it is said to be skewed to the left or said to have
negative skewness.
For moderately skewed distribution, the following relation
holds among the three
commonly used measures of central tendency.
54. Skewness
54
A unimodal distribution is a distribution with one clear peak or most
frequent value.
“Asymptotic” refers to how an estimator behaves as the sample size
gets larger (i.e. tends to infinity).
55. Skewness
55
Skewness is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the
same to the left and right of the center point.
In respect of the measures of skewness and kurtosis, we mostly use
the first measure of skewness based on mean and mode or on
mean and median.
Positive Skewed
Mode < Median < Mean
Negative Skewed
Mean < Median < Mode
Zero Skewed
Mean = Median = Mode
56. Skewness
56
Example. Suppose the mean, the mode, and the standard deviation of
a certain distribution are 32, 30.5 and 10 respectively. What is the
shape of the curve representing the distribution?
Solution:
The distribution is positively skewed
The Karl Pearson’s Coefficient of Skewness (SK):
If SK = 0, then the distribution is symmetrical.
If SK > 0, then the distribution is positively skewed.
If SK < 0, then the distribution is negatively skewed
57. Kurtosis
57
Kurtosis
Kurtosis is a measure of whether the data are heavy-
tailed or light-tailed relative to a normal distribution.
A standard normal distribution has kurtosis of 3 and is
recognized as mesokurtic.
An increased kurtosis (>3) can be visualized as a thin
“bell” with a high peak whereas a decreased kurtosis
corresponds to a broadening of the peak and
“thickening” of the tails.
Kurtosis is a statistical measure, whether the data is
heavy-tailed or light-tailed in a normal distribution.
In finance, kurtosis is used as a measure of financial risk.
58. Kurtosis
58
A large kurtosis is associated with a high level of risk for an
investment because it indicates that there are high
probabilities of extremely large and extremely small returns.
On the other hand, a small kurtosis signals a moderate level
of risk because the probabilities of extreme returns are
relatively low.
59. Kurtosis
59
Kurtosis is the degree of peakedness of a distribution, usually taken
relative to a normal distribution.
When the curve of a distribution is relatively:
flatter than normal it is known as platykurtic and
the distribution is more peaked than normal, it is called leptokurtic.
The normal distribution which is not very high peaked or flat
topped is called mesokurtic.
The moment coefficient of skewness (ß2)
If B2 =3, then the distribution is mesokurtic.
If B2 > 3, then the distribution is leptokurtic.
If B2 < 3, then the distribution is platykurtic.
60. Acceptable Standard Deviation (SD)
60
A smaller SD represents data where the results are
very close in value to the mean. The larger the SD the
more variance in the results.
Data points in a normal distribution are more likely to
fall closer to the mean. In fact, 68% of all data points
will be within ±1SD from the mean, 95% of all data
points will be within + 2SD from the mean, and 99%
of all data points will be within ±3SD.
Statisticians have determined that values no greater
than plus or minus 2 SD represent measurements that
are more closely near the true value than those that
fall in the area greater than ± 2SD.
61. Acceptable Standard Deviation (SD)
61
Statisticians have determined that values no greater than
plus or minus 2 SD represent measurements that are more
closely near the true value than those that fall in the area
greater than ± 2SD.
62. Acceptable Standard Deviation (SD)
62
A cholesterol control is run 20 times over 25 days yielding the following
results in mg/dL: 192, 188, 190, 190, 189, 191, 188, 193, 188, 190, 191, 194,
194, 188, 192, 190, 189,189, 191, 192.
• Using the cholesterol control results, follow the steps described below to
establish Quality Control/QC/ ranges.
64. Skewness and Kurtosis
64
Formula & Examples
Examples
1. Calculate Sample Skewness, Sample Kurtosis from the following grouped
data
Class Frequency
2 - 4 3
4 - 6 4
6 - 8 2
8 - 10 1
66. Coefficient of Correlation
66
DEFINITION OF CORRELATION
“If two or more quantities vary in sympathy so that
movements in one tend to be accompanied by
corresponding movements in other(s) then they are said
to be correlated.” Or
“Correlation is an analysis of co-variation between two or
more variables.”
A coefficient of correlation is generally applied in statistics
to calculate a relationship between two variables
Types of Correlation
The following are different types of correlation:
Positive and Negative Correlation
Simple, Partial and Multiple Correlation
Linear and Non-linear Correlation
67. Types of Coefficient of Correlation
67
Positive correlation: the correlation between two variables
is said to be positive or direct if an increase (or a decrease)
in one variable corresponds to an increase (or a decrease)
in the other.
Negative Correlation: the correlation between two
variables is said to be negative or inverse if an increase (or
a decrease) corresponds to a decrease (or an increase) in
the other.
Simple Correlation: It involves the study of only two
variables. For example, when we study the correlation
between the price and demand of a product, it is a
problem of simple correlation.
68. Types of Coefficient of Correlation
68
Partial Correlation: It involves the study of three or more
variables, but considers only two variables to be
influencing each other. For example, if we consider three
variables, namely yield of wheat, amount of rainfall and
amount of fertilizers and limit our correlation analysis to
yield and rainfall, with the effect of fertilizers removed, it
becomes a problem relating to partial correlation only.
Multiple Correlation: It involves the study of three or more
variables simultaneously. For example, if we study the
relationship between the yield of wheat per acre and both
amount of rainfall and the amount of fertilizers used, it
becomes a problem relating to multiple correlation.
69. Types of Coefficient of Correlation
69
Linear Correlation: The correlation between two
variables is said to be linear if the amount of change in
one variable tends to bear a constant ratio to the
amount of change in other variable.
Non-linear (or Curvilinear): The correlation between two
variables is said to be non-linear or curvilinear if the
amount of change in one variable does not bear a
constant ratio to the amount of change in other
variable.
70. Methods of Studying Correlation
70
Scatter Diagram Method
Karl Pearson’s Coefficient of Correlation, and
Spearman's Rank Correlation Method
A scatter diagram Method
A scattered diagram method the data helps in having a
visual idea about the nature of association between two
variables. If the points cluster along a straight line, the
association between two variables is linear.
Further, if the points cluster along a curve, the
corresponding association is non-linear or curvilinear.
Finally, if the points neither cluster along a straight line
nor along a curve, there is absence of any association
between the variables.
72. Karl Pearson’s Coefficient Correlation
72
Karl Pearson’s coefficient of correlation is an extensively used
mathematical method in which the numerical representation is
applied to measure the level of relation between linearly related
variables. The coefficient of correlation is expressed by “r”.
Actual Mean Method Which is Expressed as -
Pearson correlation example
When a correlation coefficient is (1), that means for every increase in one
variable, there is a positive increase in the other fixed proportion. For example,
shoe sizes change according to the length of the feet and are perfect (almost)
correlations.
When a correlation coefficient is (-1), that means for every positive increase in
one variable, there is a negative decrease in the other fixed proportion. For
example, the decrease in the quantity of gas in a gas tank shows a perfect
(almost) inverse correlation with speed.
When a correlation coefficient is (0) for every increase, that means there is no
positive or negative increase, and the two variables are not related.
73. Coefficient of Correlation
73
Correlation coefficient formulas are used to find how strong a
relationship is between data. The formulas return a value
between -1 and 1, where:
1 indicates a strong positive relationship.
-1 indicates a strong negative relationship.
A result of zero indicates no relationship at all.
74. Coefficient of Correlation
74
Example: Find the value of the correlation coefficient from the following table:
Solution
Step 1: Make a chart. Use the given data, and add three more columns:, find both x and y
mean value x2, y2, and, xy.
Step 2: Multiply x and y together to fill the xy column.
Step 3: Take the square of the numbers in the x column, and put the result in the x2
column.
Step 4: Take the square of the numbers in the y column, and put the result in the y2
column.
Step 5: Add up all of the numbers in the columns and put the result at the bottom of the
column. The Greek letter sigma (Σ) is a short way of saying “sum of” or summation.
Step 6: Use the following correlation coefficient formula.
76. Spearman Rank Correlation Method
76
A rank correlation coefficient measures the degree of
similarity between two rankings, and can be used to
assess the significance of the relation between them.
Also called rank-order.
Used when one or both variables are rank or ordinal
scales.
Difference (D) between ranks of two sets of scores is
used to determine correlation coefficient.
Examples - golf driving distance and order of finish in golf
tournament; height and IQ score; weight and order of
finish in 400 meter race; number of calories consumed
and weight lost
77. Spearman Rank Correlation Method
77
To determine :
1. List each set of scores in a column.
2. Rank the two sets of scores.
3. Place the appropriate rank beside each score.
4. Head a column D and determine the difference in rank for each pair of
scores. (Sum of the D column should always be 0)
5. Square each number in the D column and sum the
values (∑D2).
6. Calculate the correlation coefficient by subtracting the
values in the formula
n = number of observations
78. Spearman Rank Correlation Method
78
As an example,
Food R1 R2 D= R1-
R2
D2
A 2 1 1 1
B 1 3 -2 4
C 4 2 2 4
D 3 4 -1 1
E 5 5 0 0
F 7 6 1 1
G 6 7 -1 1
R= 1- 6 ∑12
𝑅 = 1 −
6 12
73 − 7
= 1-0.2142= 0.786
79. chi-square test
79
A chi-square test is a statistical test used to compare
observed results with expected results.
The purpose of this test is to determine if a difference
between observed data and expected data is due to
chance, or if it is due to a relationship between the
variables you are studying.
A chi-square (χ2) statistic is a test that measures how
a model compares to actual observed data.
The chi-square statistic compares the size of any
discrepancies between the expected results and the
actual results, given the size of the sample and the
number of variables in the relationship.
80. chi-square test (cont’d.)
80
The formula for the chi-square statistic used in the
chi square test is:
The subscript “c” is the degrees of freedom. “O” is your
observed value and E is your expected value. It’s very rare
that you’ll want to actually use this formula to find a critical
chi-square value by hand.
The summation symbol means that you’ll have to perform a
calculation for every single data item in your data set. As
you can probably imagine, the calculations can get very,
very, lengthy and tedious. Instead, you’ll probably want to
use technology:
81. chi-square test (cont’d.)
81
EXAMPLE
Employers want to know which days of the week employees are
absent in a five day work week. Most employers would like to believe
that employees are absent equally during the week. Suppose a
random sample of 60 managers were asked on which day of the week
did they have the highest number of employee absences. The results
were distributed as follows: (Use a 5% level of significance level.)
Monday Tuesday Wednesday Thursday Friday
Observed Absences 15 12 9 9 15
Expected Absences 12 12 12 12 12
Calculate the χ2 test statistic. Make a chart with the following
column headings and fill in the cells:
82. chi-square test (cont’d.)
82
SOLUTION
The null and alternate hypotheses are:
H0: The absent days occur with equal frequencies.
Ha: The absent days occur with unequal frequencies.
The degrees of freedom are one fewer than the number of cells:
df=n-1 = 5−1=4.
Now add (sum) the values of the last column. Verify that this sum is 3.
This is the Χ2 test statistic. The decision is to not reject the null hypothesis.