DATA ANALYSIS
G Giri Prasad
M.E. Structural Engineering
So far,
Data collection
(sampling)
DataTesting
Data Analysis
So far…
Now
What is DataAnalysis?
■ -Extracting useful, relevant and meaningful info from
observation(data) in systematic manner
■ -Is an integral part of any research work
Note:
It is not just throw data into software and getting fancy
graphs—it was a tiny part
Why is DataAnalysis?
■ Parameter Estimate
– Where we inferring certain unknowns (mean, mode, etc..,)
■ Model development
– Forecasting and Predicting
■ Feature Extraction
– Identifying patterns Eg., Identifying peaks in ECG (Fault Deduction)
■ HypothesisTesting
– Is the postulate the truth
....etc.,
Process of DataAnalysis?
Data
Quality
Check
Visualization
Pre-Processing
Analysis
Error sorting
Processing Operation
■ Editing
– Process of examining the collected data and correct it (To minimize the error)
– Eg., The Experiment may startT1-1,…TN+1 –here the specific time data is required,
so the unnecessary data is trimmed.
■ Coding
– It is a labeling process or assigning number or symbol
– Eg., The Experimental data @Time of 1 sec T1 (every sec data is taken)
■ Classification’s
– It is grouping of data into meaningful relation in homogeneous group
– Eg., The data of high magnitude(peak), or any
■ Tabulation
– When mass n number of data is collected it is tabulated.
– Eg., TabulatingTime to itsTemperature of atomic power plant
relative_time AccY AccX
0 0.311859 0.489014
46 -0.13347 0.484222
83 -0.22125 0.536896
144 0.238434 0.487427
204 -0.22604 0.512955
265 -0.05525 0.477844
326 0.195343 0.544876
386 -0.26913 0.469864
452 0.061264 0.485825
508 0.086807 0.517746
568 -0.2867 0.504974
638 0.496994
639 0.145859
690 0.006989 0.501785
751 -0.22604 0.525726
812 0.161819 0.474655
872 -0.07919 0.516144
937 -0.17497 0.498596
994 0.166611 0.519348
1054 -0.14143 0.474655
1117 -0.07442 0.524139
1176 0.14267 0.489014
1237 -0.19092 0.506577
1304 -0.02173 0.495407
1358 0.080414 0.543274
1419 1.419571 0.512955
1479 0.15863 0.59436
1539 0.538498
1541 -0.34256
1600 0.795486 0.493805
1661 0.014969 0.501785
1722 -1.32419 0.460297
1784 -0.05685 0.429962
1843 -0.92036 0.592758
1904 -0.69371 0.450714
1964 0.611923
1965 -0.63625
2025 -3.45981 0.564041
2088 2.913559 0.147446
2147 1.247192 0.870499
2207 0.535309
2208 -2.58673
2269 2.844925 0.112335
2329 0.760361
2330 1.435532
2389 0.730026
2390 -3.01768
2458 3.646179 -0.11113
2511 3.593506 1.165771
2572 -5.08308 0.171387
2633 0.501801 0.481033
2694 2.433121 0.789093
2754 0.490616
2755 -6.29614
2815 1.901611 0.369308
2876 1.063629 0.662994
2939 -6.70953 0.864105
2998 5.492905 -0.21487
3058 0.825806 0.913589
3119 -3.91312 0.699707
3179 4.872009 -0.2101
3240 0.549683 0.769943
3301 -2.33134 0.789093
3361 0.117111
3362 5.796173
3422 -1.71524 0.406021
3483 -4.5372 0.881668
3544 3.904755 0.627884
3608 -2.82614 -0.02174
3665 1.022125
3666 -2.89159
3730 3.031662 0.878479
3787 -5.47412 0.094772
3848 2.366089 0.147446
3912 2.563995 1.287079
3969 -2.42711 -0.02654
4030 2.460251 0.241623
4090 1.002975
4091 1.524918
4151 0.257584
4152 -0.78629
4212 3.777069 0.286316
4279 -1.71205 0.581589
4334 -3.02567 0.782715
4394 2.473022 0.399628
4455 -2.75272 0.469864
4515 -1.66257 0.958282
4579 0.67897 0.380478
4637 -3.65613 0.426773
4699 0.337387
4700 3.309402
4759 0.417206 0.766739
4819 -1.21086 0.481033
4880 3.255127 0.297485
4949 -0.82938 0.691727
5001 -0.05206 0.225647
5062 5.178467 0.662994
5123 -3.72476 0.208099
5183 -1.34174 0.849747
5244 2.056427 0.62149
5305 -4.44781 0.259171
5365 1.312637 0.656601
5431 -3.80458 0.535309
5487 -0.17816 0.712479
5548 3.639801 0.311844
5608 -1.19968 0.324615
5669 0.498596 0.787491
5730 0.241608
5731 2.939102
5791 -2.60269 0.637451
5851 3.868042 0.013367
5912 2.509735 1.251968
5973 -5.41507 0.056473
6033 0.281525
6034 1.087585
6095 0.779526 0.945511
6155 -5.15012 0.444321
6225 1.820206 0.350159
6283 0.584778
6284 -2.38083
6337 1.566422 0.565628
6400 3.401978 0.661392
6459 -2.87244 0.163406
6519 0.688553 0.647034
6591 2.034088 0.525726
6641 -2.93468 0.56723
6709 1.472244 0.275131
6767 0.781113 0.634262
6823 -2.28505 0.718857
6893 1.679749 0.192139
6955 -0.09196 0.530518
7005 -1.64021 0.733231
7066 1.95108 0.342163
7126 0.441132
7127 -0.66498
7187 -1.19809 0.639053
7267 1.939911 0.412399
7309 -1.3992 0.556061
7376 -0.26274 0.43634
7451 1.481827 0.544876
7490 -1.64659 0.56723
7552 0.396454 0.465073
7612 0.927963 0.436356
7678 -1.65457 0.653412
7741 1.004578 0.421982
7794 0.294296 0.439529
7870 -1.3577 0.647034
7923 1.280716 0.396439
7976 -0.43353 0.55127
8050 -0.88525 0.473053
8113 1.346161 0.516144
8158 -0.91556 0.482635
8219 -0.25478 0.538498
8279 1.012558 0.423569
8340 0.59436
8341 -1.1087
8401 0.203323 0.463486
8464 0.460281
8465 0.71727
8530 -1.24437 0.616714
8594 0.429962
8595 0.774734
8644 0.008591 0.520935
8705 -0.91078 0.541687
8767 0.942322 0.471451
8833 -0.4064 0.466675
8898 0.514557
8899 -0.38406
8948 0.82103 0.476257
9012 0.524139
9013 -0.76234
9076 -0.00897 0.496994
9130 0.564041 0.444321
9190 -0.84534 0.581589
9256 0.423584 0.460281
9312 0.153839 0.501785
9375 -0.73042 0.509766
9434 0.645447 0.489014
9494 -0.26115 0.519348
9555 -0.40959 0.522537
9616 0.683746 0.461884
9680 0.525726
9681 -0.53728
9747 -0.07761 0.522537
9798 0.501801 0.460281
9858 -0.71765 0.54808
9928 0.281525 0.471466
9983 0.224075 0.496994
10040 -0.6554 0.527328
10109 0.468277
10110 0.519348
10162 -0.15421 0.493805
10227 -0.41119 0.524139
10283 0.562454 0.476242
10344 0.522537
10345 -0.42555
10411 0.509766
10412 -0.11909
10466 0.437958 0.473053
10535 -0.59315 0.549667
10603 0.479446
10604 0.174591
10654 0.195343 0.492203
10727 -0.5309 0.530518
10775 0.484238
10776 0.426773
10830 -0.09515 0.506577
10896 -0.38884 0.517746
10952 0.457108 0.473053
11013 -0.34256 0.517746
11075 -0.13826 0.511368
11134 0.375702 0.453903
11194 -0.47664 0.541687
11259 0.155441 0.522537
11316 0.168213 0.490616
11377 0.503387
11378 -0.46227
11437 0.482635
11438 0.308655
11504 -0.05206 0.512955
11574 -0.31702 0.530518
11623 0.366119 0.477844
11686 -0.27872 0.503387
11758 -0.14943 0.50499
11806 0.302277 0.495407
11872 -0.35852 0.511353
11923 0.091599 0.489014
11983 0.14267 0.477844
12050 -0.34735 0.525726
12104 0.163422 0.481033
12165 0.018173 0.527328
12247 -0.30745 0.517746
12287 0.228851 0.481033
-8
-6
-4
-2
0
2
4
6
8
1
13
25
37
49
61
73
85
97
109
121
133
145
157
169
181
193
205
217
229
AccY AccX
DataTable and chart of accelerometer on building model
Terminology’s
■ Deterministic
– Data generating process is perfectly known (like math can explain in perfect)
■ Stochastic
– Observation is only a subset of many (may/not math can explain)
Eg, Testing a building model on Earthquake analysis using acceleration sensors, as
always sensor is noise it cant be biased to perfect—stochastic
– Always the data is mix of stochastic and deterministic
Terminology’s cont’d
■ Population
– All possible data
– TRUTH(Parameter) – ref to population characteristics
■ Sample
– Some possible data
– STASTISTIC(Estimate) – ref to samples characteristic
Statistic Measure
■ The role of statistics in research is to function as a tool in designing research, analyzing
its data and drawing conclusions therefrom
■ The data is to be characterized by some statistical measures for –estimate, compare
or making known
■ Why?
– Statistics is to create a summarize on data and theory like interpreting numerical
data, describing and gathering
■ Major statistical area,
– 1. DescriptiveStatistical Analysis, and
– 2. Inferential Statistical Analysis.
Descriptive StatisticalAnalysis
■ Refers to the description of numerical data from a particular sample that give or create
meaning to particular sample.
■ The conclusion must refer only the sample (these summarize the data and describe
sample characteristics)
■ CLASSIFICATION
– Frequency Distribution
– Measure of CentralTendency
– Measure of Dispersion
Frequency Distribution
■ Is a systematic arrangement of numeric values from low to end or vise versa and seeing the
occurrence.
■ ∑ f = N
– Where
■ ∑ – Sum F -- Frequency N – Sample size
Example:
Percent
Cost (Rs) Frequency Frequency
50-59 2 4
60-69 13 26
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
Total 50 100
Here We Grouped the data…..
AS, Grouping is an organizing data in frequential manner
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Measure of CentralTendency
■ Measures of central tendency (or statistical averages) tell us the point about which
items have a tendency in cluster
■ Help to find the middle, or the average, of a data set.The three most common
measures of central tendency are the mode, median, and mean.
• Mean: the sum of all values divided by the total number of values.
• Median: the middle number in an ordered data set.
• Mode: the most frequent value.
Measure of CentralTendency -- Mean
■ Mean of Ungrouped data
– The sum of all value to total no. of value
Where
x – mean x – observation n – total no. of observation
Example;
Here, mean is, (19.5+21+20)/3 = 20.167
Compressive Strength No. of cube
19.5 1
21 2
20 3
Measure of CentralTendency -- Median
■ Median of Ungrouped data
Example for odd; Median of 19.5,20,21 = 20
Example for even; Median of 19.5,20,21,21 = (4/2)𝑡ℎ
+ (4/2 + 1) 𝑡ℎ
observation / 2
= ((20) + (21)) / 2
=20.5
Measure of CentralTendency -- Mode
■ Mode
– It is the value of series which appear most frequently
Example;
Here it is clear that mode is 21 has max frequency of 2.
If there are no repeating numbers in a given list, there is no mode existing for
that particular list.
Compressive Strength No. of cube
19.5 1
21 2
20 3
21 4
Let
Timed 21 people in the sprint race, to the nearest second:
59, 65, 61, 62, 53, 55, 60, 70, 64, 56, 58, 58, 62, 62, 68, 65, 56, 59, 68, 61, 67
Grouped FrequencyTable
Seconds Frequency
51 - 55 2
56 - 60 7
61 - 65 8
66 - 70 4
So 2 runners took between 51 and 55 seconds, 7 took between 56 and 60 seconds, etc
Measure of CentralTendency -- Mean
■ Mean of grouped data
Where
x – mean, f – frequency, x – mid point of that interval
■ The groups (51-55, 56-60, etc), also called class intervals, are of width 5
■ The midpoints are in the middle of each class: 53, 58, 63 and 68
Estimated
Mean = 1288/21 = 61.333..
Seconds Midpoint
x
Frequency
f
Midpoint ×
Frequency
fx
51 - 55 53 2 106
56 - 60 58 7 406
61 - 65 63 8 504
66 - 70 68 4 272
Totals: 21 1288
Measure of CentralTendency -- Median
■ Median of grouped data
– Median =
l – low limit of median w.r.t. mode, f – frequency of median class h – width of the class
n – total frequency, ef – cumulative frequency of previous class w.r.t. median class
■ For our example:
– l = 60.5
– n = 21
– ef = 2 + 7 = 9
– f= 8
– h = 5
■ Estimated Median= 60.5 + (21/2) − 98 × 5
= 60.5 + 0.9375
= 61.4375
Measure of CentralTendency -- Mode
■ Mode of grouped data
Mode = L +
fm − fm−1
(fm − fm−1) + (fm − fm+1)
× h
where:
L is the lower class boundary of the modal group
fm-1 is the frequency of the group before the modal group
fm is the frequency of the modal group
fm+1 is the frequency of the group after the modal group
h is the group width
We can easily find the modal group (the group with the highest frequency), which is 61 - 65
We can say "the modal group is 61 - 65"
Measure of CentralTendency -- Mode
L = 60.5
fm-1 = 7
fm = 8
fm+1 = 4
w = 5
– Estimated Mode= 60.5 + [8 − 7 / (8 − 7) + (8 − 4) ]× 5
= 60.5 + (1/5) × 5
= 61.5
final result is:
• Estimated Mean: 61.333...
• Estimated Median: 61.4375
• Estimated Mode: 61.5 0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0
1
2
3
4
5
6
7
8
9
51 - 55 56 - 60 61 - 65 66 - 70
■ HereThe mean is not a true or correct represent of the value as , 23 year person may
see cinema movies, but it was not liked by Grandpa and kids
■ But by median ,the kids will watch C/N is may also seen by grandpa.
Skewing
Negative Skew
mean<median<mode
No Skew
Mean=median=mode
Positive Skew
Mean>median>mode
Positive Skew
And positive skew is when the long tail is on the positive side of the peak
Negative Skew
Some people say it is "skewed to the left" (the long tail is on the left hand side)
The mean is also on the left of the peak.
Measure of Dispersion
■ In order to measure the scatter of the values of items of a variable in the series,
statistical devices called measures of dispersion are calculated.
■ Important measures of dispersion are
• Range: the difference between the highest and lowest values
• Standard deviation: average distance from the mean
• Variance: average of squared distances from the mean
■ While the central tendency, or average, tells you where most of your points lie,
variability summarizes how far apart they are.
■ This is important because the amount of variability determines how well you can
generalize results from the sample to your population.
Measure of Dispersion -- Range
■ Range
– It is difference between highest value to lowest value
– R = H – L
Where
R – Range, H – Highest value of observation
L – Lowest value of observation
Example;
Here, Range R = 21 -19.5 = 1.5
Compressive Strength No. of cube
19.5 1
21 2
20 3
■ The Standard Deviation is a measure of how spread out numbers are
68% of values are within
1 standard deviation of the mean
95% of values are within
2 standard deviations of the mean
99.7% of values are within
3 standard deviations of the mean
•likely to be within 1 standard
deviation (68 out of 100 should be)
•very likely to be within 2 standard
deviations (95 out of 100 should be)
•almost certainly within 3 standard
deviations (997 out of 1000 should
be)
Measure of Dispersion -- Deviation
■ It is the average amount of variability in dataset. It tells, on average, how far each
value lies from the mean.
■ A high standard deviation means that values are generally far from the mean, while a
low standard deviation indicates that values are clustered close to the mean.
cont’d
Measure of Dispersion -- Deviation
Variance
TheVariance is defined as:
The average of the squared differences from the Mean.
It measure of how far a set of numbers is spread out from their average value.
To calculate the variance follow these steps:
•Work out the Mean (the simple average of the numbers)
•Then for each number: subtract the Mean and square the result (the squared difference).
•Then work out the average of those squared differences.
Cont’d
■ Example
– friends have just measured the heights of your dogs (in millimeters)
– The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm
Mean = (600 + 470 + 170 + 430 + 300) / 5
= 394
Mean
Variance
σ2 =
2062 + 762 + (−224)2 + 362 + (−94)2
5
= 21704
Standard Deviation
σ = √21704
= 147.32...
= 147 (to the nearest mm)
And the Standard Deviation is just the square root ofVariance, so:
Standard Deviation
Cont’d
■ Example
Here Mean = 20.1
Sum of squared deviation mean = 3.2
Variance = sum of squared deviated mean / n-1 (if populatin it is N)
= 3.2 / 4 = 0.8
σ = 0.8 = 0.894
low standard deviation indicates that
values are clustered close to the mean
Compressive Strength No. of cube
19.5 1
21 2
20 3
21 4
19 5
Deviation from
Mean
Squared Deviation
of mean
= 19.5-20.1 = - 0.6 0.36
= 0.9 0.81
= - 0.1 0.01
= 0.9 0.81
= - 1.1 1.21
Inferential Statistic Analysis.
■ Statistical inference is the process of using data analysis to infer properties of an
underlying distribution of probability
■ Inferential statistical analysis infers properties of a population, for example
by testing hypotheses and deriving estimates.
-- cont’d
Thanks for
listening…!!

Data Analysis.pptx

  • 1.
    DATA ANALYSIS G GiriPrasad M.E. Structural Engineering
  • 2.
  • 3.
    What is DataAnalysis? ■-Extracting useful, relevant and meaningful info from observation(data) in systematic manner ■ -Is an integral part of any research work Note: It is not just throw data into software and getting fancy graphs—it was a tiny part
  • 4.
    Why is DataAnalysis? ■Parameter Estimate – Where we inferring certain unknowns (mean, mode, etc..,) ■ Model development – Forecasting and Predicting ■ Feature Extraction – Identifying patterns Eg., Identifying peaks in ECG (Fault Deduction) ■ HypothesisTesting – Is the postulate the truth ....etc.,
  • 5.
  • 6.
    Processing Operation ■ Editing –Process of examining the collected data and correct it (To minimize the error) – Eg., The Experiment may startT1-1,…TN+1 –here the specific time data is required, so the unnecessary data is trimmed. ■ Coding – It is a labeling process or assigning number or symbol – Eg., The Experimental data @Time of 1 sec T1 (every sec data is taken) ■ Classification’s – It is grouping of data into meaningful relation in homogeneous group – Eg., The data of high magnitude(peak), or any ■ Tabulation – When mass n number of data is collected it is tabulated. – Eg., TabulatingTime to itsTemperature of atomic power plant
  • 7.
    relative_time AccY AccX 00.311859 0.489014 46 -0.13347 0.484222 83 -0.22125 0.536896 144 0.238434 0.487427 204 -0.22604 0.512955 265 -0.05525 0.477844 326 0.195343 0.544876 386 -0.26913 0.469864 452 0.061264 0.485825 508 0.086807 0.517746 568 -0.2867 0.504974 638 0.496994 639 0.145859 690 0.006989 0.501785 751 -0.22604 0.525726 812 0.161819 0.474655 872 -0.07919 0.516144 937 -0.17497 0.498596 994 0.166611 0.519348 1054 -0.14143 0.474655 1117 -0.07442 0.524139 1176 0.14267 0.489014 1237 -0.19092 0.506577 1304 -0.02173 0.495407 1358 0.080414 0.543274 1419 1.419571 0.512955 1479 0.15863 0.59436 1539 0.538498 1541 -0.34256 1600 0.795486 0.493805 1661 0.014969 0.501785 1722 -1.32419 0.460297 1784 -0.05685 0.429962 1843 -0.92036 0.592758 1904 -0.69371 0.450714 1964 0.611923 1965 -0.63625 2025 -3.45981 0.564041 2088 2.913559 0.147446 2147 1.247192 0.870499 2207 0.535309 2208 -2.58673 2269 2.844925 0.112335 2329 0.760361 2330 1.435532 2389 0.730026 2390 -3.01768 2458 3.646179 -0.11113 2511 3.593506 1.165771 2572 -5.08308 0.171387 2633 0.501801 0.481033 2694 2.433121 0.789093 2754 0.490616 2755 -6.29614 2815 1.901611 0.369308 2876 1.063629 0.662994 2939 -6.70953 0.864105 2998 5.492905 -0.21487 3058 0.825806 0.913589 3119 -3.91312 0.699707 3179 4.872009 -0.2101 3240 0.549683 0.769943 3301 -2.33134 0.789093 3361 0.117111 3362 5.796173 3422 -1.71524 0.406021 3483 -4.5372 0.881668 3544 3.904755 0.627884 3608 -2.82614 -0.02174 3665 1.022125 3666 -2.89159 3730 3.031662 0.878479 3787 -5.47412 0.094772 3848 2.366089 0.147446 3912 2.563995 1.287079 3969 -2.42711 -0.02654 4030 2.460251 0.241623 4090 1.002975 4091 1.524918 4151 0.257584 4152 -0.78629 4212 3.777069 0.286316 4279 -1.71205 0.581589 4334 -3.02567 0.782715 4394 2.473022 0.399628 4455 -2.75272 0.469864 4515 -1.66257 0.958282 4579 0.67897 0.380478 4637 -3.65613 0.426773 4699 0.337387 4700 3.309402 4759 0.417206 0.766739 4819 -1.21086 0.481033 4880 3.255127 0.297485 4949 -0.82938 0.691727 5001 -0.05206 0.225647 5062 5.178467 0.662994 5123 -3.72476 0.208099 5183 -1.34174 0.849747 5244 2.056427 0.62149 5305 -4.44781 0.259171 5365 1.312637 0.656601 5431 -3.80458 0.535309 5487 -0.17816 0.712479 5548 3.639801 0.311844 5608 -1.19968 0.324615 5669 0.498596 0.787491 5730 0.241608 5731 2.939102 5791 -2.60269 0.637451 5851 3.868042 0.013367 5912 2.509735 1.251968 5973 -5.41507 0.056473 6033 0.281525 6034 1.087585 6095 0.779526 0.945511 6155 -5.15012 0.444321 6225 1.820206 0.350159 6283 0.584778 6284 -2.38083 6337 1.566422 0.565628 6400 3.401978 0.661392 6459 -2.87244 0.163406 6519 0.688553 0.647034 6591 2.034088 0.525726 6641 -2.93468 0.56723 6709 1.472244 0.275131 6767 0.781113 0.634262 6823 -2.28505 0.718857 6893 1.679749 0.192139 6955 -0.09196 0.530518 7005 -1.64021 0.733231 7066 1.95108 0.342163 7126 0.441132 7127 -0.66498 7187 -1.19809 0.639053 7267 1.939911 0.412399 7309 -1.3992 0.556061 7376 -0.26274 0.43634 7451 1.481827 0.544876 7490 -1.64659 0.56723 7552 0.396454 0.465073 7612 0.927963 0.436356 7678 -1.65457 0.653412 7741 1.004578 0.421982 7794 0.294296 0.439529 7870 -1.3577 0.647034 7923 1.280716 0.396439 7976 -0.43353 0.55127 8050 -0.88525 0.473053 8113 1.346161 0.516144 8158 -0.91556 0.482635 8219 -0.25478 0.538498 8279 1.012558 0.423569 8340 0.59436 8341 -1.1087 8401 0.203323 0.463486 8464 0.460281 8465 0.71727 8530 -1.24437 0.616714 8594 0.429962 8595 0.774734 8644 0.008591 0.520935 8705 -0.91078 0.541687 8767 0.942322 0.471451 8833 -0.4064 0.466675 8898 0.514557 8899 -0.38406 8948 0.82103 0.476257 9012 0.524139 9013 -0.76234 9076 -0.00897 0.496994 9130 0.564041 0.444321 9190 -0.84534 0.581589 9256 0.423584 0.460281 9312 0.153839 0.501785 9375 -0.73042 0.509766 9434 0.645447 0.489014 9494 -0.26115 0.519348 9555 -0.40959 0.522537 9616 0.683746 0.461884 9680 0.525726 9681 -0.53728 9747 -0.07761 0.522537 9798 0.501801 0.460281 9858 -0.71765 0.54808 9928 0.281525 0.471466 9983 0.224075 0.496994 10040 -0.6554 0.527328 10109 0.468277 10110 0.519348 10162 -0.15421 0.493805 10227 -0.41119 0.524139 10283 0.562454 0.476242 10344 0.522537 10345 -0.42555 10411 0.509766 10412 -0.11909 10466 0.437958 0.473053 10535 -0.59315 0.549667 10603 0.479446 10604 0.174591 10654 0.195343 0.492203 10727 -0.5309 0.530518 10775 0.484238 10776 0.426773 10830 -0.09515 0.506577 10896 -0.38884 0.517746 10952 0.457108 0.473053 11013 -0.34256 0.517746 11075 -0.13826 0.511368 11134 0.375702 0.453903 11194 -0.47664 0.541687 11259 0.155441 0.522537 11316 0.168213 0.490616 11377 0.503387 11378 -0.46227 11437 0.482635 11438 0.308655 11504 -0.05206 0.512955 11574 -0.31702 0.530518 11623 0.366119 0.477844 11686 -0.27872 0.503387 11758 -0.14943 0.50499 11806 0.302277 0.495407 11872 -0.35852 0.511353 11923 0.091599 0.489014 11983 0.14267 0.477844 12050 -0.34735 0.525726 12104 0.163422 0.481033 12165 0.018173 0.527328 12247 -0.30745 0.517746 12287 0.228851 0.481033 -8 -6 -4 -2 0 2 4 6 8 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 AccY AccX DataTable and chart of accelerometer on building model
  • 8.
    Terminology’s ■ Deterministic – Datagenerating process is perfectly known (like math can explain in perfect) ■ Stochastic – Observation is only a subset of many (may/not math can explain) Eg, Testing a building model on Earthquake analysis using acceleration sensors, as always sensor is noise it cant be biased to perfect—stochastic – Always the data is mix of stochastic and deterministic
  • 9.
    Terminology’s cont’d ■ Population –All possible data – TRUTH(Parameter) – ref to population characteristics ■ Sample – Some possible data – STASTISTIC(Estimate) – ref to samples characteristic
  • 10.
    Statistic Measure ■ Therole of statistics in research is to function as a tool in designing research, analyzing its data and drawing conclusions therefrom ■ The data is to be characterized by some statistical measures for –estimate, compare or making known ■ Why? – Statistics is to create a summarize on data and theory like interpreting numerical data, describing and gathering ■ Major statistical area, – 1. DescriptiveStatistical Analysis, and – 2. Inferential Statistical Analysis.
  • 11.
    Descriptive StatisticalAnalysis ■ Refersto the description of numerical data from a particular sample that give or create meaning to particular sample. ■ The conclusion must refer only the sample (these summarize the data and describe sample characteristics) ■ CLASSIFICATION – Frequency Distribution – Measure of CentralTendency – Measure of Dispersion
  • 12.
    Frequency Distribution ■ Isa systematic arrangement of numeric values from low to end or vise versa and seeing the occurrence. ■ ∑ f = N – Where ■ ∑ – Sum F -- Frequency N – Sample size Example: Percent Cost (Rs) Frequency Frequency 50-59 2 4 60-69 13 26 70-79 16 32 80-89 7 14 90-99 7 14 100-109 5 10 Total 50 100 Here We Grouped the data….. AS, Grouping is an organizing data in frequential manner 91 78 93 57 75 52 99 80 97 62 71 69 72 89 66 75 79 75 72 76 104 74 62 68 97 105 77 65 80 109 85 97 88 68 83 68 71 69 67 74 62 82 98 101 79 105 79 69 62 73
  • 13.
    Measure of CentralTendency ■Measures of central tendency (or statistical averages) tell us the point about which items have a tendency in cluster ■ Help to find the middle, or the average, of a data set.The three most common measures of central tendency are the mode, median, and mean. • Mean: the sum of all values divided by the total number of values. • Median: the middle number in an ordered data set. • Mode: the most frequent value.
  • 14.
    Measure of CentralTendency-- Mean ■ Mean of Ungrouped data – The sum of all value to total no. of value Where x – mean x – observation n – total no. of observation Example; Here, mean is, (19.5+21+20)/3 = 20.167 Compressive Strength No. of cube 19.5 1 21 2 20 3
  • 15.
    Measure of CentralTendency-- Median ■ Median of Ungrouped data Example for odd; Median of 19.5,20,21 = 20 Example for even; Median of 19.5,20,21,21 = (4/2)𝑡ℎ + (4/2 + 1) 𝑡ℎ observation / 2 = ((20) + (21)) / 2 =20.5
  • 16.
    Measure of CentralTendency-- Mode ■ Mode – It is the value of series which appear most frequently Example; Here it is clear that mode is 21 has max frequency of 2. If there are no repeating numbers in a given list, there is no mode existing for that particular list. Compressive Strength No. of cube 19.5 1 21 2 20 3 21 4
  • 17.
    Let Timed 21 peoplein the sprint race, to the nearest second: 59, 65, 61, 62, 53, 55, 60, 70, 64, 56, 58, 58, 62, 62, 68, 65, 56, 59, 68, 61, 67 Grouped FrequencyTable Seconds Frequency 51 - 55 2 56 - 60 7 61 - 65 8 66 - 70 4 So 2 runners took between 51 and 55 seconds, 7 took between 56 and 60 seconds, etc
  • 18.
    Measure of CentralTendency-- Mean ■ Mean of grouped data Where x – mean, f – frequency, x – mid point of that interval ■ The groups (51-55, 56-60, etc), also called class intervals, are of width 5 ■ The midpoints are in the middle of each class: 53, 58, 63 and 68 Estimated Mean = 1288/21 = 61.333.. Seconds Midpoint x Frequency f Midpoint × Frequency fx 51 - 55 53 2 106 56 - 60 58 7 406 61 - 65 63 8 504 66 - 70 68 4 272 Totals: 21 1288
  • 19.
    Measure of CentralTendency-- Median ■ Median of grouped data – Median = l – low limit of median w.r.t. mode, f – frequency of median class h – width of the class n – total frequency, ef – cumulative frequency of previous class w.r.t. median class ■ For our example: – l = 60.5 – n = 21 – ef = 2 + 7 = 9 – f= 8 – h = 5 ■ Estimated Median= 60.5 + (21/2) − 98 × 5 = 60.5 + 0.9375 = 61.4375
  • 20.
    Measure of CentralTendency-- Mode ■ Mode of grouped data Mode = L + fm − fm−1 (fm − fm−1) + (fm − fm+1) × h where: L is the lower class boundary of the modal group fm-1 is the frequency of the group before the modal group fm is the frequency of the modal group fm+1 is the frequency of the group after the modal group h is the group width We can easily find the modal group (the group with the highest frequency), which is 61 - 65 We can say "the modal group is 61 - 65"
  • 21.
    Measure of CentralTendency-- Mode L = 60.5 fm-1 = 7 fm = 8 fm+1 = 4 w = 5 – Estimated Mode= 60.5 + [8 − 7 / (8 − 7) + (8 − 4) ]× 5 = 60.5 + (1/5) × 5 = 61.5 final result is: • Estimated Mean: 61.333... • Estimated Median: 61.4375 • Estimated Mode: 61.5 0 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0 1 2 3 4 5 6 7 8 9 51 - 55 56 - 60 61 - 65 66 - 70
  • 22.
    ■ HereThe meanis not a true or correct represent of the value as , 23 year person may see cinema movies, but it was not liked by Grandpa and kids ■ But by median ,the kids will watch C/N is may also seen by grandpa.
  • 23.
    Skewing Negative Skew mean<median<mode No Skew Mean=median=mode PositiveSkew Mean>median>mode Positive Skew And positive skew is when the long tail is on the positive side of the peak Negative Skew Some people say it is "skewed to the left" (the long tail is on the left hand side) The mean is also on the left of the peak.
  • 24.
    Measure of Dispersion ■In order to measure the scatter of the values of items of a variable in the series, statistical devices called measures of dispersion are calculated. ■ Important measures of dispersion are • Range: the difference between the highest and lowest values • Standard deviation: average distance from the mean • Variance: average of squared distances from the mean ■ While the central tendency, or average, tells you where most of your points lie, variability summarizes how far apart they are. ■ This is important because the amount of variability determines how well you can generalize results from the sample to your population.
  • 25.
    Measure of Dispersion-- Range ■ Range – It is difference between highest value to lowest value – R = H – L Where R – Range, H – Highest value of observation L – Lowest value of observation Example; Here, Range R = 21 -19.5 = 1.5 Compressive Strength No. of cube 19.5 1 21 2 20 3
  • 26.
    ■ The StandardDeviation is a measure of how spread out numbers are 68% of values are within 1 standard deviation of the mean 95% of values are within 2 standard deviations of the mean 99.7% of values are within 3 standard deviations of the mean •likely to be within 1 standard deviation (68 out of 100 should be) •very likely to be within 2 standard deviations (95 out of 100 should be) •almost certainly within 3 standard deviations (997 out of 1000 should be) Measure of Dispersion -- Deviation
  • 27.
    ■ It isthe average amount of variability in dataset. It tells, on average, how far each value lies from the mean. ■ A high standard deviation means that values are generally far from the mean, while a low standard deviation indicates that values are clustered close to the mean. cont’d
  • 28.
    Measure of Dispersion-- Deviation Variance TheVariance is defined as: The average of the squared differences from the Mean. It measure of how far a set of numbers is spread out from their average value. To calculate the variance follow these steps: •Work out the Mean (the simple average of the numbers) •Then for each number: subtract the Mean and square the result (the squared difference). •Then work out the average of those squared differences.
  • 29.
    Cont’d ■ Example – friendshave just measured the heights of your dogs (in millimeters) – The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm Mean = (600 + 470 + 170 + 430 + 300) / 5 = 394 Mean Variance σ2 = 2062 + 762 + (−224)2 + 362 + (−94)2 5 = 21704
  • 30.
    Standard Deviation σ =√21704 = 147.32... = 147 (to the nearest mm) And the Standard Deviation is just the square root ofVariance, so: Standard Deviation
  • 31.
    Cont’d ■ Example Here Mean= 20.1 Sum of squared deviation mean = 3.2 Variance = sum of squared deviated mean / n-1 (if populatin it is N) = 3.2 / 4 = 0.8 σ = 0.8 = 0.894 low standard deviation indicates that values are clustered close to the mean Compressive Strength No. of cube 19.5 1 21 2 20 3 21 4 19 5 Deviation from Mean Squared Deviation of mean = 19.5-20.1 = - 0.6 0.36 = 0.9 0.81 = - 0.1 0.01 = 0.9 0.81 = - 1.1 1.21
  • 32.
    Inferential Statistic Analysis. ■Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability ■ Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. -- cont’d
  • 33.