Upcoming SlideShare
×

# Akram najjar exploiting your data (for printing)

261 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
261
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Akram najjar exploiting your data (for printing)

1. 1. Quantified Self Exploiting Your Data 22 March 2012 Akram Najjar This talk is en “eye opener” We will Not discuss Techniques or “How” Data is Analyzed!We will Only talk about “What” such methods can give us
2. 2. What Methods can you Apply to Your Data?A. The Bell Shaped Curve (Normal Distribution)B. Correlation of two variablesC. Forecasting using Simple Linear Regression (Best Line of Fit)D. Statistical Process Control 3 / 25Other Tools that work directly on Data . . . . Goodness of Fit testing Independence Testing Moving Averages and Exponential Smoothing Non-Linear Regression (polynomial, exponential, logarithmic) Weighted Index Scoring Excel: The Pivot Table Excel: Conditional Formatting 4 / 25
3. 3. A. The Bell Shaped Curve (The Gaussian or Normal Distribution)  Useful when you have a lot of data  Prepare a Bar Chart or a Frequency Table  Most likely, they will plot as a Bell Shaped Curve (Normal/Gauss Curve)  Example: Measurements of most natural variables  Example: Measurements of most manufactured items  Prepare a frequency table of your data  How many times did you get a specific value?  Out of 200 measurements, how many times was your Systolic Blood Pressure = 110,115, 120, 125, 130, 135, 140 . . 5 / 25 Here are 24 Systolic Blood Pressure Measurements – They Look like a Bell Curve Probability of Pressure > 125 = (4 + 2) / 24 = 1/4 = 25%How many times? Probability of Pressure > 125 = (4 + 2) / 24 = 1/4 = 25%
4. 4. If we had 201 measurements . . . . Total Count in Bars = Area of Bars = Probability > 122 = 15.83% The Bell Shaped Curve is completely defined by:a) Average (115) of the datab) Standard deviation (7) of the data. It indicates how spread is our data from the average. (Approx 70% of observations are between 115-7 and 115+7)
5. 5. What do we get if we use the Bell ShapedCurve (Normal Distribution)?  Benefit 1: measuring the spread of our data  Benefit 2: we can now compare specific scores in two different population (next slide)  Benefit 3: if we know the measure, we can compute the probability of it happening  Benefit 4: if we know the probability, we can work out the cut off measure that will give it 9 / 25If I have the same score 78 in Courses A and B, can I say I am doing the same in both? 78 72 88
6. 6. Benefits 3 and 4 Given a specific measurement or range, what is the probability of their occurrence?  Probability I will get a fever of more than 38 degrees?  Probability flights will be more than 30 minutes late?  Probability my systolic is > 122 Given the probability, what is the cutoff measurement?  I want to remain at a sugar level representing the top 15% allowed, what is the level related to that?  If Human Resources want the top 15% results, what is the passing grade? 11 / 25B. Correlation If we have two sets of data, how are they related? Example: Blood Pressure vs Intake of Salt Example: Advertising Expenditure vs Sales Revenue Example: Hours walked per day vs Weight in Kilograms What is the direction of the relationship?  Direct or inverse? What is the strength of the relationship?  Correlation We use the Correlation Function (Demonstrate in Excel) 12 / 25
7. 7. C. Forecasting using Simple Linear Regression (Best Line of Fit) If we have an independent variable (X): Sugar Intake And a dependent variable (Y): Weight What is the relationship that allows us to forecast Weight for different Sugar Intakes? We need two columns: X and Y Simple Linear Regression allows us to find the Best Line to fit our data 13 / 25 Regression finds the Best Line that Fits our Observations 5Y 4 3 2 1 0,0 1 2 3 4 5 6 7 8
8. 8. Which Straight Line Best Fits our Observations? 5 Y 4 3 2 1 0,0 1 2 3 4 5 6 7 8 Multiple Regression: allows us to find the Equation Y = aX1 + bX2 + cX3 + d X2 X3 X1 Y 16 / 25
9. 9. D. Statistical Process Control (SPC)  The Purpose of SPC is to Monitor a Process  SPC allows us to Check if a variable is behaving properly  Over time  Over different locations/departments  Over different events  Over different samples  Control Charts were first used in Bell Labs (1924)  Although mostly used in industry SPC can be used in any sector 17 / 25 The General Form of a Control Chart: 4 Components 4) Process Data 1) UCL : Upper Control LimitOur Variable 2) AL : Average Line 3) LCL : Lower Lower Limit The IDs of the Samples - - - - - OR The Time Series
10. 10. This Process is “In control”50454035 Upper Limit3025201510 Lower Limit50 This Process is Regularly “Out of Control” Look for an explanation INSIDE the system
11. 11. This Process is Irregularly “Out of Control” Look for an explanation OUTSIDE the systemThis Process is Irregularly “Out of Control”. Trends in either direction of 5 or more points Look for an explanation OUTSIDE the system
12. 12. The 7 Point Rule: there is a problem if 7 points in arow (Or more) are above the average or below it Look for an explanation OUTSIDE the system Types of Control Charts
13. 13. Thank youfor your kind attention