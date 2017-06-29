ECHELON ASIA SUMMIT 2017 STARTUP ACADEMY [WORKSHOP] INTRODUCTION TO DATA SCIENCE 29th June 2017 Garrett Teoh Hor Keong
OPENING
PROGRAM FLOW 1. Data Science Fundamentals (10 min) 2. Exploratory Data Analysis (25 min) 3. Building Machine Learning & AI...
DATA SCIENCE FUNDAMENTAL S
STAGES OF DATA SCIENCE What has happened? What will happen? What should happen? Data Collection Machine Learning Cognitive...
CROSS INDUSTRY STANDARD PROCESS – DATA MINING Business Understanding Collect & Understand Data Data Prep & Cleansing Build...
DOMAINS OF DATA SCIENCE Supervised Learning - Species Classifications - HR Churn - Sales Conversion - Performance Ranking ...
TOOLS FOR DATA SCIENCE
DRIVING TOWARDS DIGITAL TRANSFORMATION  Data Scientists (Building Models, Evaluation)  Data Analysts (Visualizations, re...
EXPLORATORY DATA ANALYSIS
ADULT CENSUS INCOME DATASET – BACKGROUND This data was extracted from the 1994 Census bureau database by Ronny Kohavi and ...
ADULT CENSUS INCOME DATASET – UNDERSTANDING Link to data description: https://www.kaggle.com/uciml/adult-census-income Res...
PREPARING & CLEANING UP THE DATASET Explore how to use Excel Sheet (xlsx) to prepare and clean up the Adult Census Income ...
NUMERICAL FEATURES DISTRIBUTION & RECODING Some numerical (continuous or integers) features might be slightly correlated t...
ADULT CENSUS INCOME DATASET – EDA PRACTICE The cleaned data can be downloaded from https://goo.gl/qE7TPf (cleaned-adult.zi...
EXPLORATORY DATA ANALYSIS – CORRELATION PLOT relationships Female Male Grand Total Husband 0.01% 99.99% 100.00% Wife 99.87...
EXPLORATORY DATA ANALYSIS SUMMARY Executive Summary (What has happened?) 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35...
EXPLORATORY DATA ANALYSIS SUMMARY Executive Summary (What has happened?) Higher earned incomers tend to have a significant...
BUILDING MACHINE LEARNING & AI
MACHINE LEARNING ALGORITHMS – UNSUPERVISED • You do not know what you don’t have an idea • All data is unlabelled and the ...
MACHINE LEARNING ALGORITHMS – UNSUPERVISED CLUSTERING Hierarchical Clustering K - Means Kernel Density Discriminant Analys...
MACHINE LEARNING ALGORITHMS – SUPERVISED • You do not know what you knew • All data is labelled and the algorithms learn t...
MACHINE LEARNING ALGORITHMS – SUPERVISED CLASSIFICATIONS REGRESSIONS - Decision Tree, Random Forest - eXtreme Gradient BOO...
TOOLS & RESOURCES CONSIDERATIONS • Near real time updates and monitoring. (e.g. Pricing Analysis, Recommendation Engine, T...
ADULT CENSUS INCOME PREDICTIONS 70% of the data are used for training a model Remaining 30% used as ‘hold-out’ samples for...
EVALUATING ALGORITHMS & MODELS
TYPES OF ML MODEL EVALUATION METRICS • Validating prediction model against known outcome/labels. • For “unsupervised” meth...
BINARY CLASSIFICATION MODEL EVALUATION • Gini Lift and Decile Charts • Ranking predictions and examine how much ‘lift’ doe...
AREA UNDER ROC CURVE Probability >= 0.5, Predict response as positive else, negative Confusion Matrix Target Positive Nega...
VISUALIZING DATA & STORYTELLIN G
THE BIG PICTURE – PUTTING IT TOGETHER 0 100 200 300 400 500 600 700 800 900 1000 17 22 27 32 37 42 47 52 57 62 67 72 77 82...
USING COMBINATION OF CHARTS 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 0 500 1000 1500 2000 2500 3000 114 1055 1409...
MAXIMIZING ROI ON MARKETING RESPONSE • Assumptions: 1. Average loan amount $10,000 2. Interest return at 10% 3. Default ra...
QUESTIONS & ANSWERS
THANK YOU ECHELON ASIA SUMMIT 2017 Garrett Teoh Hor Keong Chief Data Officer, Renotalk Pte Ltd LinkedIn: garrettteoh Email...
×