Successfully reported this slideshow.

# Data Analysis Introduction.pptx

Upcoming SlideShare
Microsoft word.pptx
×

# Data Analysis Introduction.pptx

data analysis

data analysis

## More Related Content

### Data Analysis Introduction.pptx

1. 1. Data Analysis Lab - 1 Introduction By Dr. Abhishek Kumar Singh
2. 2. Student Introduction • Name • City and State • Education detail (graduation, XII and X)
3. 3. • PhD (IIT BHU Varanasi) • M Tech (IIT BHU Varanasi) • B Tech (GBTU) • 3 Research Paper in SCOPUS/ABDC Indexed journals • 8 papers reviewed as a reviewer • Six sigma green belt
4. 4. Content • Syllabus • Data Analysis • Variables • Univariate • Bivariate
5. 5. Univariate Descriptive Analysis • Measures of Central Tendency- Mean, Median, Mode • Measures of Variability- Range, Variance, Standard Deviation, Co-efficient of Deviation • Measures of Shape- Skewness and Kurtosis • Measures of Stability- Standard Error
6. 6. Bivariate Descriptive Analysis • Covariance • Correlation
7. 7. Data Analysis • The Process of cleaning, transforming, interpreting, analyzing and visualizing the data to extract useful information and gain valuable insights to make more effective business decisions is called data analysis.
8. 8. Variables • Variables: Any character, characteristics or quality that varies is termed a variable. • E.g.: To collect the basic clinical and demographic information on patients with particular illness. Variables of interest may include Gender (M/F), age and height of the patients.
9. 9. Variable Categorical Numerical Nominal Ordinal Discrete Continuous Categories are mutually exclusive and unordered. Eg. Gender (M/F) Blood Group (A/B/AB/O) Categories are mutually exclusive and ordered. Eg. Disease severity (Mild, Moderate and Severe) Integer values, typically counts no notion of magnitude. Eg. No. of children vaccinated, days sick per year Takes any value in a range of values have a magnitude. E.g. weight in kg and Height in cm
10. 10. Statistics Descriptive Inferential • Collecting • Organizing • Summarizing • Presenting Data • Making inference • Hypothesis testing • Determining relationship • Making Prediction
11. 11. Three types of analysis • Univariate analysis: the examination of cases on only one variable at a time (e.g., weight of college students). • Bivariate analysis: the examination of two variables simultaneously (e.g., the relation between gender and weight of college students). • Multivariate analysis: examination of two variables simultaneously (e.g., the relationship between gender, race, and weight of college students).
12. 12. Purpose of different type of analysis • Univariate analysis: mainly description • Bivariate analysis: Determining the empirical relationship between two variables. • Multivariate analysis: Determining the empirical relationship among multiple variables.
13. 13. Univariate • The objective of univariate analysis is to derive the data, define and summarize it and analyze the pattern present in it. • Univariate techniques are appropriate when there is a single measurement of each element in the sample or when there are several measurements of each element but each variable is analyzed in isolation.
14. 14. Univariate Descriptive Inferential • Measures of Central Tendency- Mean, Median, Mode • Measures of Variability- Range, Variance, Standard Deviation, Co-efficient of Deviation • Measures of Shape- Skewness and Kurtosis • Measures of Stability- Standard Error • z test • t test • Chi square test
15. 15. Numerical Methods • Mean – Let X1, X2, X3,….Xn be the n data points, then mean of data is defined as – Mean provide the central value about which the data is spread out.
16. 16. Numerical Methods • Median – Median is the value which divide the data in two halves – Let X1, X2, X3,….Xn be the n data points – Order the n data values – If the number of data points is odd then sample median is the value in position of (n+1)/2 – If the number of data points is even then sample median is the average of value in position of n/2 and (n/2+1)
17. 17. Mean or Median? • Both the measures provide the “middle” value of data, so how do they compare? – Median is robust again extreme values in the data – While mean is affected by the extreme values • Example: 8, 9, 10, 11, 12 be the five data points – Mean = 10 and Median = 10 – Replace 12 by 18 • Mean = 11.2 but Median =10
18. 18. Numerical Methods • Mode – Mode is the a value in data that occurs with highest frequency – It’s the most probable value of the data – It is possible to have data that has more than one Mode value. Such data is called multimodal.
19. 19. Measures of Variability • Percentile – Order the data in ascending order • Then, p1 in called the first percentile if 1% of points lie below this value • Similarly pk is called the k% of data points lie below this value, where 0≤k≤100 • Quartile – P25 is called the 1st quartile Q1 – P75 is called the 3rd quartile Q3 – P50 is Median
20. 20. Measure of Dispersion • Measures the spread of data – Range – Variation or standard deviation • Measures the spread about mean/average value of data – Interquartile range • Measures the spread about median value of the data
21. 21. Measure of Dispersion • Range = M-m, where, – M = Max (x1, x2, ….xn) – m = Min (x1, x2, ….xn) • Variance – S2 = – Standard deviation = S • Interquartile range: Q3 - Q1
22. 22. Standard Deviation • Standard Deviation is most commonly used measure of dispersion. – Under the assumption of normality the range of Covers 67% of the data. • Hence, this is commonly used to show possible error in the observed value of data
23. 23. Graphical Method • Histogram or Bar chart – Frequency Plot • Pie Chart • Cumulative frequency plot • Box and Whisker plot
24. 24. Bivariate • Bi means two and variate means variable, so here there are two variables. The analysisis related to cause and the relationship between the two variables. • Correlation • Covariance