3.
Shaya’a Othman Definition of Statistics “ Statistics is a scientific method of collecting , organizing , presenting , analyzing and interpreting of numerical information, developed from mathematical theory of probability, to assist in making effective and efficient decision.” Definition by Shaya'a Othman,
4.
OVERVIEW OF STATISTICS Collecting & Publishing Numerical data
Scienctific Method of
Collecting,
Organizing,
Presenting,
Analyzing ,
Interpreting,
numerical information,
developed from mathematical theory of probability, to assist in making effective and efficien t decision.
DEFINATION DESCRIPTIVE STATISTICS: Methods of Organizing , and Presenting Data in informative way . INFERENTIAL STATISTICS : Methods of determine something about population base on sample .
Levels of Measurement
Nominal
Ordinal
Interval
Ratio
DATA TYPES Varibles Levels Inferential Descriptive Science common ETHICS
Misleading Data
Use of Average
Use of Graphic
Use of Association
Computer Application:
Microsoft Excel
SPSS, NVivo (CAQDAS}
COMPUTER STATISTICS
5.
Collection of Data Primary Data Secondary Data Census [Total Count] Sample [selected Count]
SAMPLING TECHNIQUES;
Systematic sampling
Stratified Sampling
Multi-dtage Sampling
Cluster sampling
Quota sampling
METHODS OF COLLECTING
Interviews Forms - Direct/Phone
Mailing Questionnaires
Computer -eMail, eFax, etc
Mobile Phone -SMS
MALAYSIAN GOVERNMENT PUBLICATION:
Statistics Dept. PM Dept.
Econ. Planning Unit, PM Dept
Research Institution -RRI, PORIM, MARDI,
Private Survey/Research Co.
INTERNATIONAL
ORGANAZIATION :
United Nations
OIC ,ASEAN,
World Bank, Islamic Dev. Bank
Government Publications International Organization Private Publication/Data Total Count of Population Selected Count of Population Internet, Website ,- CIA Data SOURSE TECHNIQUES METHODS Internets COLLECTING DATA
8.
5-STEPS PROCEDURE FOR TESTING HYPOTHESIS STEPS ACTIONS DESCRIPTIONS STEP 1 State Null and Alternative hypothesis Null Hypothesis : Ho = 0 Alternative Hypothesis : H1 = 0 Note : 1.Two-tailed test if alternative hypothesis does not state direction [ greater or less]. 2. One-tailed test if alternative state direction. STEP 2 Select Level of Significance
.01 level [1% level] - for consumer research
.05 level [5% level] – for quality assurance
.10 level [10% level] – for political pooling
STEP 3 Identify the test Statistics z and t as test statistic , and others Non-Parametric Test : F and X Chi-square statistic STEP 4 Formulate Decision Rule Find the critical value of z from Normal Distribution table , or value t from t distribution table where appropriate. STEP 5 Take a sample arrive at decision Only ONE DECISION is possible in Hypothesis Testing Do not reject Null Hypothesis , or reject Null Hypothesis and Accept Alternative Hypothesis
9.
1.Two-tailed test if alternative hypothesis does not state direction [ greater or less]. 2. One-tailed test if alternative state direction.
13.
- - - - - - - STATISTICAL TEST OF HYPOTHESIS One-Sample Tests of Hypothesis Two-Samples Tests of Hypothesis Large sample [ n more than 30] Small Sample [ n less than 30] Large Sample [n more than 30 ] Small Sample [n less than 30] Two-Tail Test [No direction] z = x – u σ /√n Using normal distribution table t = x - u s/ √n df = n-1 Using t distribution table z = x₁ - x₂ ______ √ [ ( σ₁ ² / n₁ ) +( σ₂² / n₂)] t = x₁ - x₂ ______ √ [ (s ₁² /n₁ ) +(s ₂² / n₂ )] df = n + n - 2 Using t- distribution table One-Tail Test [With direction : Greater or less than]
14.
Hypothesis – “A supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation . Oxford Dictionary Hypothesis – “ A statement or conjecture which is neither true nor false, subjected to be verified “ Shayaa Othman, KUIS Hypothesis – “A statement about a population parameter developed for the purpose of testing “ Douglas A Lind Statistical Techniques on Business Economics Hypothesis Testing – “A procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement. “ Douglas A Lind statistical Techniques on Business Economics Null Hypothesis – “A statement about a the value of a population parameter.” Douglas A Lind statistical Techniques on Business Economics Alternative Hypothesis – “A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false.” Douglas A Lind statistical Techniques on Business Economics
15.
Describing Data – Measures of Location Population Mean = Sum of all the values in the Population Number of Values in the Population Sample Mean = Sum of values in the Sample = Σ x Number of Values in the Sample n Weighted Mean = Σ[wx] Σw Parameter = A characteristic of Population Median = The midpoint of values after they have been ordered from the smallest to the highest Mode = The value of observations that appears most frequently
16.
Describing data = Measures of Dispersion Range = Largest Value – Smaller Value Mean Deviation = The Arithmetic mean of the absolute values of the deviation from the arithmetic mean = l X- X l n where is sigma [sum of]; X = value of each observation; X = arithmetic mean of the values; n is number of observation ; l l indicates absolute values Variance = The arithmetic mean of the of the squared deviation from the mean Standard Deviation = The Square Root of the variance Location of Percentiles = L p = (n+1) P 100 M M
17.
Characteristics of the Mean It is calculated by summing the values and dividing by the number of values.
It requires the interval scale.
All values are used.
It is unique.
The sum of the deviations from the mean is 0.
The Arithmetic Mean is the most widely used measure of location and shows the central value of the data. The major characteristics of the mean are: 3-
For ungrouped data, the Population Mean is the sum of all the population values divided by the total number of population values: 3-
19.
Example 1 Find the mean mileage for the cars. A Parameter is a measurable characteristic of a population. AHMAD’s family owns four cars. The following is the current mileage on each of the four cars. 56,000 23,000 42,000 73,000 3-
20.
Sample Mean where n is the total number of values in the sample. For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values: 3-
21.
Example 2 A statistic is a measurable characteristic of a sample. A sample of five executives received the following bonus last year ($000): 14.0, 15.0, 17.0, 16.0, 15.0 3-
22.
Example 4 During a one hour period on a hot Saturday afternoon in Langkawi, Ahmad sold fifty drinks. He sold five drinks for $0.50,; fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of the price of the drinks. 3-
23.
The Median There are as many values above the median as below it in the data array. For an even set of values, the median will be the arithmetic average of the two middle numbers and is found at the (n+1)/2 ranked observation. The Median is the midpoint of the values after they have been ordered from the smallest to the largest. 3-
24.
The median (continued) The ages for a sample of five INSANIAH students visiting Islamic Artifact Exhibition: 21, 25, 19, 20, 22,18, 27. Arranging the data in ascending order gives: 18,19, 20, 21, 22, 25, 27 Thus median = 21. 3-
25.
Example 5 Arranging the data in ascending order gives: 73, 76, 80 Thus the median is 76. The heights of 3 INSANIAH Lecturers, in inches, are: 76, 73, 80. The median is found at the (n+1)/2 = (3+1)/2 =2 th data point. 3-
26.
The Mode: Example 6 Example 6 : The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often , it is the mode. Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like. The Mode is another measure of location and represents the value of the observation that appears most frequently. 3-
27.
The Relative Positions of the Mean, Median, and Mode Symmetric distribution : A distribution having the same shape on either side of the center Skewed distribution : One whose shapes on either side of the center differ; a nonsymmetrical distribution. Can be positively or negatively skewed, or bimodal 3-
28.
The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution
Zero skewness Mean
=Median
=Mode
3-
29.
The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution
Positively skewed : Mean and median are to the right of the mode.
Negatively Skewed : Mean and Median are to the left of the Mode.
Mean<Median<Mode
The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution 3-
31.
Geometric Mean The geometric mean is used to average percents, indexes, and relatives. The Geometric Mean ( GM ) of a set of n numbers is defined as the nth root of the product of the n numbers. The formula is: 3-
32.
Example 7 The interest rate on three bonds were 5, 21, and 4 percent. The arithmetic mean is (5+21+4)/3 =10.0. The geometric mean is The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent. 3-
33.
Geometric Mean continued Another use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to another. 3-
34.
Example 8 The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. That is, the geometric mean rate of increase is 1.27%. 3-
35.
Describing data = Measures of Dispersion Range = Largest Value – Smaller Value Mean Deviation = The Arithmetic mean of the absolute values of t he deviation from the arithmetic mean = E l X- X’ l n where E is sigma [sum of]; X = value of each observation; X’ = arithmetic mean of the values; n is number of observation ; l l indicates absolute values Variance = The arithmetic mean of the of the squared deviation from the mean Standard Deviation = The Square Root of the variance
36.
Measures of Dispersion Dispersion refers to the spread or variability in the data. Measures of dispersion include the following: range , mean deviation , variance , and standard deviation . Range = Largest value – Smallest value 3-
37.
Example 9 The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio. Highest value: 22.1 Lowest value: -8.1 Range = Highest value – lowest value = 22.1-(-8.1) = 30.2 3-
38.
Mean Deviation Mean Deviation The arithmetic mean of the absolute values of the deviations from the arithmetic mean. The main features of the mean deviation are:
All values are used in the calculation.
It is not unduly influenced by large or small values.
The absolute values are difficult to manipulate.
3-
39.
Example 10 The weights of a sample of crates containing books for the INSANIAH Library (in pounds ) are: 103, 97, 101, 106, 103 Find the mean deviation. X = 102 The mean deviation is: 3-
40.
Variance and standard Deviation Variance : the arithmetic mean of the squared deviations from the mean. Standard deviation : The square root of the variance. 3-
The units are awkward, the square of the original units.
All values are used in the calculation.
The major characteristics of the Population Variance are: 3-
42.
Variance and standard deviation Population Variance formula: X is the value of an observation in the population m is the arithmetic mean of the population N is the number of observations in the population Population Standard Deviation formula: 3-
43.
Example 9 continued In Example 9, the variance and standard deviation are: 3-
44.
Sample variance and standard deviation Sample variance (s 2 ) Sample standard deviation (s) 3-
45.
Example 11 The hourly wages earned by a sample of five students are: $7, $5, $11, $8, $6. Find the sample variance and standard deviation. 3-
48.
Cumulative Frequency Polygon Histogram & Frequency Polygon
49.
Example 12 A sample of ten movie in TV tallied the total number of movies showing in all TV channel last week. Compute the mean number of movies showing. 3-
50.
The Median of Grouped Data where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval. The Median of a sample of data organized in a frequency distribution is computed by: 3-
51.
Describing Data – Measures of Location [For Grouped Data] MEAN MEDIAN MODE
The Mean of a sample of data organized in a frequency distribution is computed by the following formula:
3-
53.
Example 12 A sample of ten movie theaters in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing. 3-
54.
The Median of Grouped Data where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval. The Median of a sample of data organized in a frequency distribution is computed by: 3-
55.
Finding the Median Class To determine the median class for grouped data Construct a cumulative frequency distribution. Divide the total number of data values by 2. Determine which class will contain this value. For example, if n =50, 50/2 = 25, then determine which class will contain the 25 th value. 3-