The document provides an introduction to data analytics, including defining key terms like data, information, and analytics. It outlines the learning outcomes which are the basic definition of data analytics concepts, different variable types, types of analytics, and the analytics life cycle. The analytics life cycle is described in detail and involves problem identification, hypothesis formulation, data collection, data exploration, model building, and model validation/evaluation. Different variable types like numerical, categorical, and ordinal variables are also defined.
2. Outcomes
Students would learn.
1. Basic definition of Data, Information, and Data analytics
2. Different types of variables
3. Types of analytics
4. Analytics Life Cycle
19-08-2017KK Singh, RGUKT Nuzvid
2
3. Basic Definition
Data: Data is a set of values of qualitative or quantitative variables. It is information in
raw or unorganized form. It may be a fact, figure, characters, symbols etc.
Information: Meaningful or organised data is information.
Analytics: Analytics is the discovery , interpretation, and communication of meaningful
patterns or summery in data.
Data Analytics (DA) is the process of examining data sets in order to draw conclusion
about the information it contains.
Analytics is not a tool or technology, rather it is the way of thinking and acting on data.
19-08-2017KK Singh, RGUKT Nuzvid
3
4. Fathima 50 DA , 70 DA Hema…..
19-08-2017KK Singh, RGUKT Nuzvid
4
5. Data Analytics (Cont..)
Examples
1. Business analytics
2. Risk ,,
3. Fraud ,,
4. Health ,,
5. Web ,,
Types of analytics
1. Descriptive Analytics (“What has happened?”) (Data aggregation, summary, data mining)
2. Predictive Analytics (“What might happen?”) (Regression, LSE,MLE)
3. Prescriptive Analytics (“What should we do?”) (Optimization, Recommendation)
19-08-2017KK Singh, RGUKT Nuzvid
5
7. Answer the question
1. Which type of variable is “Street Number” ?
2. Which type of variable is “Phone Number” ?
3. Which type of variable is “Annual Income” ?
19-08-2017KK Singh, RGUKT Nuzvid
7
8. Analytics Life Cycle
1. Problem Identification
2. Hypothesis formulation
3. Data Collection
4. Data Exploration/preparation
5. Model Building
6. Model Validation and Evaluation
19-08-2017KK Singh, RGUKT Nuzvid
8
9. Analytics Life Cycle(Cont..)
1. Problem Identification
The problem is a situation which is judged to be corrected or solved
Problem can be identified through
1. Comparative/benchmarking studies
2. Performance Reporting
3. Asking some basic questions
a) Who are affected by the problem?
b) What will happen if problem is not solved?
c) When and where does the problem occur?
d) Why is the problem occurring
e) How are the people currently handling the problem?
19-08-2017KK Singh, RGUKT Nuzvid
9
10. Analytics Life Cycle(Cont..)
2. Hypothesis formulation
1. Frame the questions which need to be answered.
2. Develop a comprehensive list of all possible issues related to the problem.
3. Reduce the list by eliminating duplicates and combining overlapping issues.
4. Using consensus building get down to a major issue list.
3. Data Collection
Data collection techniques are
1. Using data that is already collected by others
2. Systematically selecting and watching characteristics of people, objects, and events.
3. Oral questioning respondents either individually or as a group
4. Collecting data based on answers provided by the respondents in written format.
19-08-2017KK Singh, RGUKT Nuzvid
10
11. Analytics Life Cycle(Cont..)
4. Data Exploration
1. Importing data
2. Variable Identification
3. Data Cleaning
4. Summarizing data
5. Selecting subset of data
5. Model Building
Building a Model is a very iterative process because there is no such
thing as final and perfect solution.
Many of the machine learning and statistical techniques are
available in traditional technology platform
19-08-2017KK Singh, RGUKT Nuzvid
11
12. 6. Model validation and Evaluation
Like model building the process of validating model is also a
iterative process.
There are so many ways …
Confusion Matrix.
Confidence Interval.
ROC curve
Chi Square.
Root Mean Square Error
Gain and Lift Chart.
19-08-2017KK Singh, RGUKT Nuzvid
12
14. Assignment
What is the probability a student is a female and has no remedial ?
What is the probability that student has no remedial given the student is a boy ?
What is average throughput (internet) in K3 block ?
Plot channel utilization per day in July 2017 in k3 block ?
19-08-2017KK Singh, RGUKT Nuzvid
14