SlideShare a Scribd company logo
1 of 12
Download to read offline
Data Analysis Course
Analysis Design Document (Version-1)
Venkat Reddy
Data Analysis Course
•
•   Introduction to statistical data analysis
•   Descriptive statistics
•   Data exploration, validation & sanitization
•




                                                                Venkat Reddy
                                                          Data Analysis Course
    Probability distributions examples and applications
•   Simple correlation and regression analysis
•   Multiple liner regression analysis
•   Logistic regression analysis
•   Testing of hypothesis
•   Clustering and decision trees
•   Time series analysis and forecasting
•   Credit Risk Model building-1
                                                                 2
•   Credit Risk Model building-2
Note
• This presentation is just class notes. The course notes for Data
  Analysis Training is by written by me, as an aid for myself.
• The best way to treat this is as a high-level summary; the
  actual session went more in depth and contained other




                                                                           Venkat Reddy
                                                                     Data Analysis Course
  information.
• Most of this material was written as informal notes, not
  intended for publication
• Please send questions/comments/corrections to
  venkat@trenwiseanalytics.com or 21.venkat@gmail.com
• Please check my website for latest version of this document
                                         -Venkat Reddy                      3
Contents
•   Background, Objective & Scope
•   Understanding Data, Data Cleaning & Audit
•   Overall summary & Summary by various segments
•   Benchmark Analysis, Tracking basic metrics, KPIs




                                                             Venkat Reddy
                                                       Data Analysis Course
•   Control charts , trends & forecasting
•   Multivariate analysis & segmentation
•   Driver analysis




                                                              4
In scope & Out of scope
•   Background
•   What is the objective of the project
•   What is in scope of the project?
•   Are there any data related issues which will make some




                                                                   Venkat Reddy
                                                             Data Analysis Course
    analysis impossible, hence out of scope?




                                                                    5
Data exploration, Data validation &
Data sanitization
• Data exploration- Get a feel of the data
• Data validation - Is the data precise?
• Data Sanitization – What if there are some inaccuracies in the
  data




                                                                         Venkat Reddy
                                                                   Data Analysis Course
  • Missing Value Treatment
  • Outlier Treatment Identification & Treatment




                                                                          6
Overall summary & Summary by various
segments
•   Descriptive analysis of objective variable
•   Descriptive statistics of other important variables
•   Univariate analysis of important fields
•   Data visualization of variables




                                                                            Venkat Reddy
                                                                      Data Analysis Course
•   Analysis across various segments or cuts of the population
• Bivariate analysis & visualizations
    • Analysis with more than two variables
    • Frequencies, means etc., considering combination of variables
    • Correlations and simple regressions


                                                                             7
Benchmark Analysis, Tracking
derived metrics & KPIs
• Derived variables
• Key processing indicators
  • Ratios & deviations etc.,
• Comparison vs target & average




                                                     Venkat Reddy
                                               Data Analysis Course
• RAG- Red Green Amber charts and Dashboards




                                                      8
Control charts & trends, forecasting
• Tracking of important metrics over time
• 1.5 s control charts
• Time series forecasting of future vales




                                                  Venkat Reddy
                                            Data Analysis Course
                                                   9
Multivariate analysis & segmentation
• Finding the groups or segments in the population that are
  behaving alike
  • Segments with respect to objective
  • Overall segments




                                                                    Venkat Reddy
                                                              Data Analysis Course
                                                                  10
  Details later
Driver analysis
• Regression analysis for finding the most impacting drivers
• Most influencing factors on the objective variable
• Quantifying the impact of each factor & comparison of factors




                                                                        Venkat Reddy
                                                                  Data Analysis Course
Details later                                                         11
Venkat Reddy Konasani
Manager at Trendwise Analytics
venkat@TrendwiseAnalytics.com
21.venkat@gmail.com




                                       Venkat Reddy
                                 Data Analysis Course
+91 9886 768879




                                     12

More Related Content

What's hot

Smart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond BitcoinSmart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond BitcoinJim McKeeth
 
Building trust through Explainable AI
Building trust through Explainable AIBuilding trust through Explainable AI
Building trust through Explainable AIPeet Denny
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningJohn Alex
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
 
BigData_TP5 : Neo4J
BigData_TP5 : Neo4JBigData_TP5 : Neo4J
BigData_TP5 : Neo4JLilia Sfaxi
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionJustluk Luk
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
認証から見たリモート署名 ー利用認証と鍵認可ー
認証から見たリモート署名 ー利用認証と鍵認可ー認証から見たリモート署名 ー利用認証と鍵認可ー
認証から見たリモート署名 ー利用認証と鍵認可ーNaoto Miyachi
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...Publicis Sapient Engineering
 
Introduction to blockchain & cryptocurrencies
Introduction to blockchain & cryptocurrenciesIntroduction to blockchain & cryptocurrencies
Introduction to blockchain & cryptocurrenciesAurobindo Nayak
 
Loan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxLoan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxBhoirRitesh19ET5008
 
Stratégies d’optimisation de requêtes SQL dans un écosystème Hadoop
Stratégies d’optimisation de requêtes SQL dans un écosystème HadoopStratégies d’optimisation de requêtes SQL dans un écosystème Hadoop
Stratégies d’optimisation de requêtes SQL dans un écosystème HadoopSébastien Frackowiak
 
final presentation fake news detection.pptx
final presentation fake news detection.pptxfinal presentation fake news detection.pptx
final presentation fake news detection.pptxRudraSaraswat6
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
 
Database 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvanDatabase 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvanIftikhar Ahmad
 
Smart Health Prediction Report
Smart Health Prediction ReportSmart Health Prediction Report
Smart Health Prediction ReportArhind Gautam
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File Systemtutchiio
 

What's hot (20)

Smart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond BitcoinSmart Contracts - The Blockchain Beyond Bitcoin
Smart Contracts - The Blockchain Beyond Bitcoin
 
Building trust through Explainable AI
Building trust through Explainable AIBuilding trust through Explainable AI
Building trust through Explainable AI
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine Learning
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
BigData_TP5 : Neo4J
BigData_TP5 : Neo4JBigData_TP5 : Neo4J
BigData_TP5 : Neo4J
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
認証から見たリモート署名 ー利用認証と鍵認可ー
認証から見たリモート署名 ー利用認証と鍵認可ー認証から見たリモート署名 ー利用認証と鍵認可ー
認証から見たリモート署名 ー利用認証と鍵認可ー
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...
XebiCon'18 - Utiliser Hyperledger Fabric pour la création d'une blockchain pr...
 
Introduction to blockchain & cryptocurrencies
Introduction to blockchain & cryptocurrenciesIntroduction to blockchain & cryptocurrencies
Introduction to blockchain & cryptocurrencies
 
Loan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxLoan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptx
 
Stratégies d’optimisation de requêtes SQL dans un écosystème Hadoop
Stratégies d’optimisation de requêtes SQL dans un écosystème HadoopStratégies d’optimisation de requêtes SQL dans un écosystème Hadoop
Stratégies d’optimisation de requêtes SQL dans un écosystème Hadoop
 
final presentation fake news detection.pptx
final presentation fake news detection.pptxfinal presentation fake news detection.pptx
final presentation fake news detection.pptx
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
Database 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvanDatabase 2 ddbms,homogeneous & heterognus adv & disadvan
Database 2 ddbms,homogeneous & heterognus adv & disadvan
 
Smart Health Prediction Report
Smart Health Prediction ReportSmart Health Prediction Report
Smart Health Prediction Report
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 

Viewers also liked

Data Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationData Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationVenkata Reddy Konasani
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Venkata Reddy Konasani
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1Venkata Reddy Konasani
 

Viewers also liked (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Data Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationData Exploration, Validation and Sanitization
Data Exploration, Validation and Sanitization
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple Regression
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Timeseries forecasting
Timeseries forecastingTimeseries forecasting
Timeseries forecasting
 
ARIMA
ARIMA ARIMA
ARIMA
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 

Similar to Data analysis Design Document

Improving Healthcare Delivery
Improving Healthcare DeliveryImproving Healthcare Delivery
Improving Healthcare DeliveryDave DeBonis
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development AnalyticsRay Buse
 
chapter12-220725121546-610a1427.pdf
chapter12-220725121546-610a1427.pdfchapter12-220725121546-610a1427.pdf
chapter12-220725121546-610a1427.pdfMahmoudSOLIMAN380726
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality ManagementAhmed Alorage
 
Lean Six Sigma Green Belt Services Certification Brochure
Lean Six Sigma Green Belt Services Certification BrochureLean Six Sigma Green Belt Services Certification Brochure
Lean Six Sigma Green Belt Services Certification BrochurePartner
 
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...Kaitlan Chu
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxrandyburney60861
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Jason Owen - Mesuring the Sucess of Content Strategy with Metadata
Jason Owen - Mesuring the Sucess of Content Strategy with MetadataJason Owen - Mesuring the Sucess of Content Strategy with Metadata
Jason Owen - Mesuring the Sucess of Content Strategy with MetadataJack Molisani
 
Data Governance And Technology Enablement First San Francisco Partners 2009
Data Governance And Technology Enablement   First San Francisco Partners  2009Data Governance And Technology Enablement   First San Francisco Partners  2009
Data Governance And Technology Enablement First San Francisco Partners 2009First San Francisco Partners
 
LeanScape - Lean Six Sigma Green Belt Book of Knowledge
LeanScape - Lean Six Sigma Green Belt Book of KnowledgeLeanScape - Lean Six Sigma Green Belt Book of Knowledge
LeanScape - Lean Six Sigma Green Belt Book of KnowledgeReagan Pannell
 
2015 ISACA NACACS - Audit as Controls Factory
2015 ISACA NACACS - Audit as Controls Factory2015 ISACA NACACS - Audit as Controls Factory
2015 ISACA NACACS - Audit as Controls FactoryNathan Anderson
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataLinked Enterprise Date Services
 
Planificación del proyecto estimación
Planificación del proyecto   estimaciónPlanificación del proyecto   estimación
Planificación del proyecto estimaciónProColombia
 
KnowledgeAdvisors 10th Annual Analytics Symposium- Panel: The Role Of The Le...
KnowledgeAdvisors 10th Annual Analytics Symposium-  Panel: The Role Of The Le...KnowledgeAdvisors 10th Annual Analytics Symposium-  Panel: The Role Of The Le...
KnowledgeAdvisors 10th Annual Analytics Symposium- Panel: The Role Of The Le...Anand K. Chandarana
 
Retail and Wholesale Consumer Centric Merchandising
Retail and Wholesale Consumer Centric MerchandisingRetail and Wholesale Consumer Centric Merchandising
Retail and Wholesale Consumer Centric MerchandisingDave DeBonis
 
Data architecture around risk management
Data architecture around risk managementData architecture around risk management
Data architecture around risk managementSuvradeep Rudra
 
Healthcare Business Intelligence for Power Users
Healthcare Business Intelligence for Power UsersHealthcare Business Intelligence for Power Users
Healthcare Business Intelligence for Power UsersPerficient, Inc.
 

Similar to Data analysis Design Document (20)

Improving Healthcare Delivery
Improving Healthcare DeliveryImproving Healthcare Delivery
Improving Healthcare Delivery
 
Customer Centricity Engine
Customer Centricity EngineCustomer Centricity Engine
Customer Centricity Engine
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development Analytics
 
chapter12-220725121546-610a1427.pdf
chapter12-220725121546-610a1427.pdfchapter12-220725121546-610a1427.pdf
chapter12-220725121546-610a1427.pdf
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
 
Lean Six Sigma Green Belt Services Certification Brochure
Lean Six Sigma Green Belt Services Certification BrochureLean Six Sigma Green Belt Services Certification Brochure
Lean Six Sigma Green Belt Services Certification Brochure
 
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
User Experience Design on Cleveland Clinic Corporate Website | Medical Inform...
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Jason Owen - Mesuring the Sucess of Content Strategy with Metadata
Jason Owen - Mesuring the Sucess of Content Strategy with MetadataJason Owen - Mesuring the Sucess of Content Strategy with Metadata
Jason Owen - Mesuring the Sucess of Content Strategy with Metadata
 
Data Governance And Technology Enablement First San Francisco Partners 2009
Data Governance And Technology Enablement   First San Francisco Partners  2009Data Governance And Technology Enablement   First San Francisco Partners  2009
Data Governance And Technology Enablement First San Francisco Partners 2009
 
LeanScape - Lean Six Sigma Green Belt Book of Knowledge
LeanScape - Lean Six Sigma Green Belt Book of KnowledgeLeanScape - Lean Six Sigma Green Belt Book of Knowledge
LeanScape - Lean Six Sigma Green Belt Book of Knowledge
 
2015 ISACA NACACS - Audit as Controls Factory
2015 ISACA NACACS - Audit as Controls Factory2015 ISACA NACACS - Audit as Controls Factory
2015 ISACA NACACS - Audit as Controls Factory
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
 
Planificación del proyecto estimación
Planificación del proyecto   estimaciónPlanificación del proyecto   estimación
Planificación del proyecto estimación
 
lec1.pdf
lec1.pdflec1.pdf
lec1.pdf
 
KnowledgeAdvisors 10th Annual Analytics Symposium- Panel: The Role Of The Le...
KnowledgeAdvisors 10th Annual Analytics Symposium-  Panel: The Role Of The Le...KnowledgeAdvisors 10th Annual Analytics Symposium-  Panel: The Role Of The Le...
KnowledgeAdvisors 10th Annual Analytics Symposium- Panel: The Role Of The Le...
 
Retail and Wholesale Consumer Centric Merchandising
Retail and Wholesale Consumer Centric MerchandisingRetail and Wholesale Consumer Centric Merchandising
Retail and Wholesale Consumer Centric Merchandising
 
Data architecture around risk management
Data architecture around risk managementData architecture around risk management
Data architecture around risk management
 
Healthcare Business Intelligence for Power Users
Healthcare Business Intelligence for Power UsersHealthcare Business Intelligence for Power Users
Healthcare Business Intelligence for Power Users
 

More from Venkata Reddy Konasani

More from Venkata Reddy Konasani (14)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 

Data analysis Design Document

  • 1. Data Analysis Course Analysis Design Document (Version-1) Venkat Reddy
  • 2. Data Analysis Course • • Introduction to statistical data analysis • Descriptive statistics • Data exploration, validation & sanitization • Venkat Reddy Data Analysis Course Probability distributions examples and applications • Simple correlation and regression analysis • Multiple liner regression analysis • Logistic regression analysis • Testing of hypothesis • Clustering and decision trees • Time series analysis and forecasting • Credit Risk Model building-1 2 • Credit Risk Model building-2
  • 3. Note • This presentation is just class notes. The course notes for Data Analysis Training is by written by me, as an aid for myself. • The best way to treat this is as a high-level summary; the actual session went more in depth and contained other Venkat Reddy Data Analysis Course information. • Most of this material was written as informal notes, not intended for publication • Please send questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com • Please check my website for latest version of this document -Venkat Reddy 3
  • 4. Contents • Background, Objective & Scope • Understanding Data, Data Cleaning & Audit • Overall summary & Summary by various segments • Benchmark Analysis, Tracking basic metrics, KPIs Venkat Reddy Data Analysis Course • Control charts , trends & forecasting • Multivariate analysis & segmentation • Driver analysis 4
  • 5. In scope & Out of scope • Background • What is the objective of the project • What is in scope of the project? • Are there any data related issues which will make some Venkat Reddy Data Analysis Course analysis impossible, hence out of scope? 5
  • 6. Data exploration, Data validation & Data sanitization • Data exploration- Get a feel of the data • Data validation - Is the data precise? • Data Sanitization – What if there are some inaccuracies in the data Venkat Reddy Data Analysis Course • Missing Value Treatment • Outlier Treatment Identification & Treatment 6
  • 7. Overall summary & Summary by various segments • Descriptive analysis of objective variable • Descriptive statistics of other important variables • Univariate analysis of important fields • Data visualization of variables Venkat Reddy Data Analysis Course • Analysis across various segments or cuts of the population • Bivariate analysis & visualizations • Analysis with more than two variables • Frequencies, means etc., considering combination of variables • Correlations and simple regressions 7
  • 8. Benchmark Analysis, Tracking derived metrics & KPIs • Derived variables • Key processing indicators • Ratios & deviations etc., • Comparison vs target & average Venkat Reddy Data Analysis Course • RAG- Red Green Amber charts and Dashboards 8
  • 9. Control charts & trends, forecasting • Tracking of important metrics over time • 1.5 s control charts • Time series forecasting of future vales Venkat Reddy Data Analysis Course 9
  • 10. Multivariate analysis & segmentation • Finding the groups or segments in the population that are behaving alike • Segments with respect to objective • Overall segments Venkat Reddy Data Analysis Course 10 Details later
  • 11. Driver analysis • Regression analysis for finding the most impacting drivers • Most influencing factors on the objective variable • Quantifying the impact of each factor & comparison of factors Venkat Reddy Data Analysis Course Details later 11
  • 12. Venkat Reddy Konasani Manager at Trendwise Analytics venkat@TrendwiseAnalytics.com 21.venkat@gmail.com Venkat Reddy Data Analysis Course +91 9886 768879 12