PROBABILITY AND STATISTICS BY ENGR. JORGE P. BAUTISTA
Introduction to Statistics
Tabular and Graphical representation of Data
Measures of Central Tendencies, Locations and Variations
Measure of Dispersion and Correlation
Probability and Combinatorics
Discrete and Continuous Distributions
Text and References
Statistics: a simplified approach by Punsalan and Uriarte, 1998, Rex Texbook
Probability and Statistics by Johnson, 2008, Wiley
Counterexamples in Probability and Statistics by Romano and Siegel, 1986, Chapman and Hall
Introduction to Statistics
In its plural sense, statistics is a set of numerical data e.g. Vital statistics, monthly sales, exchange rates, etc.
In its singular sense, statistics is a branch of science that deals with the collection, presentation, analysis and interpretation of data.
General uses of Statistics
Aids in decision making by providing comparison of data, explains action that has taken place, justify a claim or assertion, predicts future outcome and estimates un known quantities
Summarizes data for public use
Examples on the role of Statistics
In Biological and medical sciences, it helps researchers discover relationship worthy of further attention.
Ex. A doctor can use statistics to determine to what extent is an increase in blood pressure dependent upon age
- In social sciences, it guides researchers and helps them support theories and models that cannot stand on rationale alone.
Ex. Empirical studies are using statistics to obtain socio-economic profile of the middle class to form new socio-political theories.
In business, a company can use statistics to forecast sales, design products, and produce goods more efficiently.
Ex. A pharmaceutical company can apply statistical procedures to find out if the new formula is indeed more effective than the one being used.
In Engineering, it can be used to test properties of various materials,
Ex. A quality controller can use statistics to estimate the average lifetime of the products produced by their current equipment.
Fields of Statistics
Statistical Methods of Applied Statistics:
Descriptive-comprise those methods concerned with the collection, description, and analysis of a set of data without drawing conclusions or inferences about a larger set.
Inferential-comprise those methods concerned with making predictions or inferences about a larger set of data using only the information gathered from a subset of this larger set.
b. Statistical theory of mathematical statistics- deals with the development and exposition of theories that serve as a basis of statistical methods
Descriptive VS Inferential
A bowler wants to find his bowling average for the past 12 months
A housewife wants to determine the average weekly amount she spent on groceries in the past 3 months
A politician wants to know the exact number of votes he receives in the last election
A bowler wants to estimate his chance of winning a game based on his current season averages and the average of his opponents.
A housewife would like to predict based on last year’s grocery bills, the average weekly amount she will spend on groceries for this year.
A politician would like to estimate based on opinion polls, his chance for winning in the upcoming election.
Population as Differrentiated from Sample
The word population refers to groups or aggregates of people, animals, objects, materials, happenings or things of any form, this means that there are populations of students, teachers, supervisors, principals, laboratory animals, trees, manufactured articles, birds and many others. If your interest is on few members of the population to represent their characteristics or traits, these members constitute a sample. The measures of the population are called parameters, while those of the sample are called estimates or statistics.
It refers to a characteristic or property whereby the members of the group or set vary or differ from one another. However, a constant refers to a property whereby the members of the group do not differ one another.
Variables can be according to functional relationship which is classified as independent and dependent. If you treat variable y as a function of variable z, then z is your independent variable and y is your dependent variable. This means that the value of y, say academic achievement depends on the value of z.
Variables according to continuity of values.
1. Continuous variable – these are variables whose levels can take continuous values. Examples are height, weight, length and width.
2. Discrete variables – these are variables whose values or levels can not take the form of a decimal. An example is the size of a particular family.
Variables according to scale of measurements:
1. Nominal – this refers to a property of the members of a group defined by an operation which allows making of statements only of equality or difference. For example, individuals can be classified according to thier sex or skin color. Color is an example of nominal variable.
2. Ordinal – it is defined by an operation whereby members of a particular group are ranked. In this operation, we can state that one member is greater or less that the others in a criterion rather than saying that he/it is only equal or different from the others such as what is meant by the nominal variable.
3. Interval – this refers to a property defined by an operation which permits making statement of equality of intervals rather than just statement of sameness of difference and greater than or less than. An interval variable does not have a “true” zero point.; althought for convenience, a zero point may be assigned.
4. Ratio – is defined by the operation which permits making statements of equality of ratios in addition to statements of sameness or difference, greater than or less than and equality or inequality of differences. This means that one level or value may be thought of or said as double, triple or five times another and so on.
Assignment no. 1
Make a list of at least 5 mathematician or scientist that contributes in the field of statistics. State their contributions
With your knowledge of statistics, give a real life situation how statistics is applied. Expand your answer.
When can a variable be considered independent and dependent? Give an example for your answer.
IV. Enumerate some uses of statistics. Do you think that any science will develop without test of the hypothesis? Why?
Examples of Scales of Measurement
Ex. Sex: M-Male F-Female
Marital Status: 1-single 2- married 3- widowed 4- separated
2. Ordinal Level
Ex. Teaching Ratings: 1-poor 2-fair 3- good 4- excellent
3. Interval Level
Ex. IQ, temperature
4. Ratio Level
Ex. Age, no. of correct answers in exam
Data Collection Methods
Survey Method – questions are asked to obtain information, either through self administered questionnaire or personal interview.
Observation Method – makes possible the recording of behavior but only at the time of occurrence (ex. Traffic count, reactions to a particular stimulus)
3. Experimental method – a method designed for collecting data under controlled conditions. An experiment is an operation where there is actual human interference with the conditions that can affect the variable under study.
4. Use of existing studies – that is census, health statistics, weather reports.
5. Registration method – that is car registration, student registration, hospital admission and ticket sales.
Frequency Distribution is defined as the arrangement of the gathered data by categories plus their corresponding frequencies and class marks or midpoint. It has a class frequency containing the number of observations belonging to a class interval. Its class interval contain a grouping defined by the limits called the lower and the upper limit. Between these limits are called class boundaries.
Frequency of a Nominal Data Male and Female College students Major in Chemistry 130 TOTAL 107 FEMALE 23 MALE FREQUENCY SEX
Frequency of Ordinal Data Ex. Frequency distribution of Employee Perception on the Behavior of their Administrators 100 total 31 Strongly unfavorable 22 Unfavorable 14 Slightly unfavorable 12 Slightly favorable 11 favorable 10 Strongly favorable Frequency Perception
Frequency Distribution Table
Raw data – is the set of data in its original form
Array – an arrangement of observations according to their magnitude, wither in increasing or decreasing order.
Advantages: easier to detect the smallest and largest value and easy to find the measures of position
Grouped Frequency of Interval Data
Given the following raw scores in Algebra Examination,
56 42 28 56 41 56 55 59
50 55 57 38 62 52 66 65
33 34 37 47 42 68 62 54
68 48 56 39 77 80 62 71
57 52 60 70
Compute the range: R = H – L and the number of classes by K = 1 + 3.322log n where n = number of observations.
Divide the range by 10 to 15 to determine the acceptable size of the interval. Hint: most frequency distribution have odd numbers as the size of the interval. The advantage is that the midpoints of the intervals will be whole number.
Organize the class interval. See to it that the lowest interval begins with a number that is multiple of the interval size.
4. Tally each score to the category of class interval it belongs to.
5. Count the tally columns and summarizes it under column (f). Then add the frequency which is the total number of the cases (N).
6. Determine the class boundaries. UCB and LCB.(upper and lower class boundary)
7. Compute the midpoint for each class interval and put it in the column (M).
M = (LS + HS) / 2
8. Compute the cumulative distribution for less than and greater than and put them in column cf< and cf>. (you can now interpret the data). cf = cumulative frequency
9. Compute the relative frequency distribution. This can be obtained by
RF% = CF/TF x 100%
CF = CLASS FREQUENCY
TF = TOTAL FREQUENCY
The data can be graphically presented according to their scale or level of measurements.
1. Pie chart or circle graph. The pie chart at the right is the enrollment from elementary to master’s degree of a certain university. The total population is 4350 students
2. Histogram or bar graph- this graphical representation can be used in nominal, ordinal or interval. For nominal bar graph, the bars are far apart rather than connected since the categories are not continuous. For ordinal and interval data, the bars should be joined to emphasize the degree of differences
Given the bar graph of how students rate their library.
A-strongly favorable, 90
C-slightly favorable, 88
D-slightly unfavorable, 48
F-strongly unfavorable, 25
The Histogram of Person’s Age with Frequency of Travel 100% 51 total 3.9% 2 27-28 7.8% 4 25-26 7.8% 4 23-24 41.2% 21 21-22 39.2% 20 19-20 RF freq age
From the previous grouped data on algebra scores,
Draw its histogram using the frequency in the y axis and midpoints in the x axis.
Draw the line graph or frequency polygon using frequency in the y axis and midpoints in the x axis.
Draw the less than and greater than ogives of the data. Ogives is a cumulation of frequencies by class intervals. Let the y axis be the CF> and x axis be LCB while y axis be CF< and x axis be UCB
d. Plot the relative frequency using the y axis as the relative frequency in percent value while in the x axis the midpoints.
Assignment No. 2
Given the score in a statistics examinations,
38 56 35 70 44 81 44 80
45 72 45 50 51 51 52 66
54 53 56 84 58 56 57 70
56 39 56 59 72 63 89 63
69 65 61 62 64 64 69 60
53 66 66 67 67 68 68 69
66 67 70 59 40 71 73 60
73 73 73 73 73 74 73 73
79 74 74 70 73 46 74 74
74 75 75 76 55 77 78 73
48 81 44 84 77 88 63 85
Construct the class interval, frequency table, class midpoint(use a whole number midpoint), less than and greater than cumulative frequency, upper and lower boundary and relative frequency.
Plot the histogram, frequency polygon, ogives and relative frequency curve
3. Draw the pie chart and bar graph of the plans of computer science students with respect to attending a seminar. Computer for the Relative frequency of each.