ECHELON
ASIA SUMMIT 2017
STARTUP ACADEMY
[WORKSHOP]
INTRODUCTION TO
DATA SCIENCE
29th June 2017
Garrett Teoh Hor Keong
OPENING
PROGRAM FLOW
1. Data Science Fundamentals
(10 min)
2. Exploratory Data Analysis
(25 min)
3. Building Machine Learning & AI
(10 min)
4. Evaluating Algorithms & Models
(20 min)
5. Visualizing Data & Storytelling
(20 min)
6. Questions & Answers
(5 min)
DATA SCIENCE
FUNDAMENTAL
S
STAGES OF DATA SCIENCE
What has
happened?
What will
happen?
What should
happen?
Data Collection Machine Learning Cognitive
Actionable Insights!Visualizations / Storytelling
Exploratory Data Analysis
Classifications
CROSS INDUSTRY STANDARD PROCESS – DATA
MINING
Business
Understanding
Collect &
Understand
Data
Data Prep
&
Cleansing
Build
AI & Models
Evaluate
Models
Deploy &
Productionalize Data Lake
Local vs Cloud?
What has happened?
What will happen?
What should happen?
1
2
3
6
5
4
DOMAINS OF DATA SCIENCE
Supervised
Learning
- Species
Classifications
- HR Churn
- Sales
Conversion
- Performance
Ranking
Unsupervised
Learning
- Credit Card
Fraud
- Procurement
Fraud
- Preventive
Maintenance
Imaging &
Recognition
- Facial
Recognition
- Product
Categories
- Healthcare
Imaging
Operations
Research
- Optimizing
Costs vs Revenue
(HR Planning)
- Optimizing
Costs for
Machines, Pipes
to Gas Stations
(Revenue)
Recommend
Engine
- Collaborative
Filtering
- Cross-Sell
Products
TOOLS FOR DATA SCIENCE
DRIVING TOWARDS DIGITAL TRANSFORMATION
 Data Scientists (Building Models, Evaluation)
 Data Analysts (Visualizations, reports, EDA)
 Data Engineer (Data Lake, Deployment, ETL)
 IT Developers (Deployment, Data Collections)
 Internal (Employees, Accounts, Audit Logs, Marketing)
 External (Sales, Customers Behaviours, Measurements)
 Public (Census, Info sites, Facebook, Twitter, New & Media)
 Data Aggregator Companies
 Data Storage
 Data Processing & ETLs
 Data Access & Governance
 Computational Resource
 Real Time Processing
 Visualization Tools
 Data Modelling Tools
 Deployment Tools
EXPLORATORY
DATA ANALYSIS
ADULT CENSUS INCOME DATASET – BACKGROUND
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining
and Visualization, Silicon Graphics). The prediction task is to determine whether a person makes over $50K a year.
Link to data: https://goo.gl/qE7TPf (adult.csv.zip)
ADULT CENSUS INCOME DATASET – UNDERSTANDING
Link to data description: https://www.kaggle.com/uciml/adult-census-income
Response (Binary)
Features or
Predictors (14)
Data Types:
Integer
Continuous
Binary
Date/Time
Ordinal
Categorical
Text
PREPARING & CLEANING UP THE DATASET
Explore how to use Excel Sheet (xlsx) to prepare and clean up the Adult Census Income dataset.
Step 1 • Convert raw data from .csv format to .xlsx format. “save as…”
Step 2 • Click on “sort & filter” to examine data type and categories.
Step 3 • Identify blanks, missing data, or irrelevant data.
Step 4
• Alternatively, use “pivot tables” and “charts” to identify distribution and categorical
counts. Select all data using ctrl+shift+arrow keys -> click on insert pivot tables ->
new worksheet.
Step 5 • Create a derived binary response (using “IF” function to return 0 or 1).
Step 6 • Use “VLOOKUP” to replace blanks, missing or irrelevant data.
Step 7
• Insert “combo clustered” 2-D chart using the data on pivot table to examine
correlation of response between each feature.
Step 8 • Remove features with high % of missing data.
NUMERICAL FEATURES DISTRIBUTION & RECODING
Some numerical (continuous or integers) features might be slightly correlated to the response, and thus it is
important to identify the trends of these features and recode them as necessarily.
Step 1
• Examine correlations of the continuous feature with response or using parametric
(Student’s T-test) /non-parametric (Wilcoxon ranked) tests.
Step 2
• Observe the histogram plot of the continuous feature with response by making a
“combo clustered” or a “scattered plot”
Step 3 • Identify highly correlated segments and recode feature
Mean Age (target=0): 37
Mean Age (target=1): 44
ADULT CENSUS INCOME DATASET – EDA PRACTICE
The cleaned data can be downloaded from https://goo.gl/qE7TPf (cleaned-adult.zip)
EXPLORATORY DATA ANALYSIS – CORRELATION PLOT
relationships Female Male Grand Total
Husband 0.01% 99.99% 100.00%
Wife 99.87% 0.13% 100.00%
Not-in-family 46.66% 53.34% 100.00%
Other-relative 43.83% 56.17% 100.00%
Own-child 44.30% 55.70% 100.00%
Unmarried 77.02% 22.98% 100.00%
Grand Total 33.08% 66.92% 100.00%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Husband Wife Not-in-family Other-relative Own-child Unmarried
Relationship vs Gender
Female (%) Male (%)
EXPLORATORY DATA ANALYSIS SUMMARY
Executive Summary (What has happened?)
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
0
2000
4000
6000
8000
10000
12000
0 1 2 3 4 5
Age Group vs High Income
counts high income (%)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
0
2000
4000
6000
8000
10000
12000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Education num vs High Income
count high income (%)
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
50.00%
0
2000
4000
6000
8000
10000
12000
14000
Relationships vs High Income
count high income (%)
Overall High Income 24%
EXPLORATORY DATA ANALYSIS SUMMARY
Executive Summary (What has happened?)
Higher earned
incomers tend to
have a significantly
higher capital
gain/loss.
Both this features
might improve
prediction modelling
performance.
BUILDING
MACHINE
LEARNING &
AI
MACHINE LEARNING ALGORITHMS – UNSUPERVISED
• You do not know what you don’t have an idea
• All data is unlabelled and the algorithms learn to inherent structure from the input data.
• You only have input data (X) and no corresponding output variables.
 How a fraudster does it?
 When will it happen?
 How to differentiate them?
 Where are the anomalies?
Telco Fraud People Management
 Who is the top performer?
 What are the metrics?
 Who to award a promotion?
 Where do they stand out?
Product Cross/Up Sell
 Who will need those products?
 What is inside their shopping carts?
 Which products to market?
 How to package products?
MACHINE LEARNING ALGORITHMS – UNSUPERVISED
CLUSTERING
Hierarchical
Clustering
K -
Means
Kernel
Density
Discriminant
Analysis
Isolation
Forest
One-Class
SVM
ASSOCIATIONS
Apriori
Eclat
FP-
Growth
Context
Based
MACHINE LEARNING ALGORITHMS – SUPERVISED
• You do not know what you knew
• All data is labelled and the algorithms learn to predict the output from the input data
• you have input variables (x) and an output variable (Y)
 How a lead will convert?
 What features or properties
are important?
 How to deal with leads with
marginal probability?
Leads Conversion Financing
 Who is a good borrower?
 Who will default on a loan?
 Rules or pattern to
differentiate them?
 How to interpret
probabilities of default?
Property Sales
 What is the best price?
 What features affect sale price?
 Do price affects sale probability?
 Optimizing time, price, ability to
close a sales?
MACHINE LEARNING ALGORITHMS – SUPERVISED
CLASSIFICATIONS REGRESSIONS
- Decision Tree, Random Forest
- eXtreme Gradient BOOSTing (XGBOOST)
- Gradient Boosted Trees
- Generalised Linear Model
- Logistic Regression
- Neural Networks
- Support Vector Machine (SVM)
- K Nearest Neighbour (KNN), K Means
- eXtreme Gradient BOOSTing (XGBOOST)
- Linear Gradient Boosted
- Generalised Linear Model
- Lasso, Ridge Regression
- Elastic Net
- Least Angle Regression (LARS)
- Neural Networks
TOOLS & RESOURCES CONSIDERATIONS
• Near real time updates and monitoring. (e.g. Pricing Analysis, Recommendation Engine,
Threat/Fraud Detection, Preventive Maintenance)
• Periodic updates. (People Analysis, Marketing Response Prediction, Sales Forecast, Cancer/Disease
Risk)
• Predict-On-Demand. (Credit Risk/Scoring, Leads Conversion)
• Storage:
• Hadoop Distributed File System (HDFS), Traditional RDBMS, AWS Redshift, AWS RDS/S3
instance, HBase.
• Architecture:
• Apache Spark (Near Real Time Analytics) e.g. SparkR, PySpark, H2O.
• HDInsights, HortonWorks, SpringXD
• Computational:
• Computational power – Number of CPU cores, GPUs, RAM memory
ADULT CENSUS INCOME PREDICTIONS
70% of the data are used for training a model
Remaining 30% used as ‘hold-out’ samples
for trained model’s prediction
Predictions are generated from XGBoost
algorithm, using Gradient Boosted Trees
Training time: < 10 seconds on a Acer Inspire v
15 notebook, Intel Core i7, 12GB RAM
1000 iterations
EVALUATING
ALGORITHMS &
MODELS
TYPES OF ML MODEL EVALUATION METRICS
• Validating prediction model against known outcome/labels.
• For “unsupervised” methods, model is evaluated only by the distance from the “known” clusters
centroid.
• RMSE (Root Means Square Error)
• RMSLE (Root Means Square Logarithm Error)
• MAE (Mean Absolute Error)
• LogLoss (Logarithmic Loss)
• MAP@n (Mean Average Precision @n Classes)
• MLogLoss (Multi Class Logarithmic Loss)
• Hamming Loss
• AUC (Area Under ROC Curve)
• Most commonly used evaluation for binary classifications prediction models
• Range: 0.5 ~ 1.0
 Measure how close the forecasts or predictions
are to the eventual outcomes.
 More suited to regressions models.
 Range (0 - ∞)
 More suited to classification models.
 Range (0 - ∞)
BINARY CLASSIFICATION MODEL EVALUATION
• Gini Lift and Decile Charts
• Ranking predictions and examine how much ‘lift’ does the model provide (NULL model).
• Kolmogorov Smirnov Chart
• Examine how well the model differentiate between 2 classes.
• Confusion Matrix
• Commonly used by medical domain to assess sensitivity vs specificity of tests
AREA UNDER ROC CURVE
Probability >= 0.5,
Predict response
as positive else,
negative
Confusion Matrix
Target
Positive Negative
Model
Positive 1539 368 Positive Pred Rate 0.8070
Negative 839 7022 Negative Pred Rate 0.8933
Sensitivity Specificity
87.643%
0.6472 0.9502
Sensitivity = 64%
1-Specificity = 5%
Sensitivity = True Positive Rate
1-Specificity = False Positive Rate
VISUALIZING
DATA &
STORYTELLIN
G
THE BIG PICTURE – PUTTING IT TOGETHER
0
100
200
300
400
500
600
700
800
900
1000
17 22 27 32 37 42 47 52 57 62 67 72 77 82 87 22 27 32 37 42 47 52 57 62 67 72 77 83
0 1
Age vs Income
Total
Mean Age (target=0): 37
Mean Age (target=1): 44
USING COMBINATION OF CHARTS
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
0
500
1000
1500
2000
2500
3000
114
1055
1409
1639
2036
2176
2346
2463
2653
2961
3137
3432
3781
4064
4650
4934
5556
6497
7298
7978
10566
14344
20051
34095
Capital Gain vs High Income (%)
Count High Income (%)
MAXIMIZING ROI ON MARKETING RESPONSE
• Assumptions:
1. Average loan amount $10,000
2. Interest return at 10%
3. Default rate at 5%
4. Marketing costs 20% of average revenue
5. Simple mechanics of how financing works
QUESTIONS
& ANSWERS
THANK YOU
ECHELON ASIA SUMMIT 2017
Garrett Teoh Hor Keong
Chief Data Officer, Renotalk Pte Ltd
LinkedIn: garrettteoh
Email: rtgteoh@renotalk.com

Echelon Asia Summit 2017 Startup Academy Workshop

  • 1.
    ECHELON ASIA SUMMIT 2017 STARTUPACADEMY [WORKSHOP] INTRODUCTION TO DATA SCIENCE 29th June 2017 Garrett Teoh Hor Keong
  • 2.
  • 3.
    PROGRAM FLOW 1. DataScience Fundamentals (10 min) 2. Exploratory Data Analysis (25 min) 3. Building Machine Learning & AI (10 min) 4. Evaluating Algorithms & Models (20 min) 5. Visualizing Data & Storytelling (20 min) 6. Questions & Answers (5 min)
  • 4.
  • 5.
    STAGES OF DATASCIENCE What has happened? What will happen? What should happen? Data Collection Machine Learning Cognitive Actionable Insights!Visualizations / Storytelling Exploratory Data Analysis Classifications
  • 6.
    CROSS INDUSTRY STANDARDPROCESS – DATA MINING Business Understanding Collect & Understand Data Data Prep & Cleansing Build AI & Models Evaluate Models Deploy & Productionalize Data Lake Local vs Cloud? What has happened? What will happen? What should happen? 1 2 3 6 5 4
  • 7.
    DOMAINS OF DATASCIENCE Supervised Learning - Species Classifications - HR Churn - Sales Conversion - Performance Ranking Unsupervised Learning - Credit Card Fraud - Procurement Fraud - Preventive Maintenance Imaging & Recognition - Facial Recognition - Product Categories - Healthcare Imaging Operations Research - Optimizing Costs vs Revenue (HR Planning) - Optimizing Costs for Machines, Pipes to Gas Stations (Revenue) Recommend Engine - Collaborative Filtering - Cross-Sell Products
  • 8.
  • 9.
    DRIVING TOWARDS DIGITALTRANSFORMATION  Data Scientists (Building Models, Evaluation)  Data Analysts (Visualizations, reports, EDA)  Data Engineer (Data Lake, Deployment, ETL)  IT Developers (Deployment, Data Collections)  Internal (Employees, Accounts, Audit Logs, Marketing)  External (Sales, Customers Behaviours, Measurements)  Public (Census, Info sites, Facebook, Twitter, New & Media)  Data Aggregator Companies  Data Storage  Data Processing & ETLs  Data Access & Governance  Computational Resource  Real Time Processing  Visualization Tools  Data Modelling Tools  Deployment Tools
  • 10.
  • 11.
    ADULT CENSUS INCOMEDATASET – BACKGROUND This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). The prediction task is to determine whether a person makes over $50K a year. Link to data: https://goo.gl/qE7TPf (adult.csv.zip)
  • 12.
    ADULT CENSUS INCOMEDATASET – UNDERSTANDING Link to data description: https://www.kaggle.com/uciml/adult-census-income Response (Binary) Features or Predictors (14) Data Types: Integer Continuous Binary Date/Time Ordinal Categorical Text
  • 13.
    PREPARING & CLEANINGUP THE DATASET Explore how to use Excel Sheet (xlsx) to prepare and clean up the Adult Census Income dataset. Step 1 • Convert raw data from .csv format to .xlsx format. “save as…” Step 2 • Click on “sort & filter” to examine data type and categories. Step 3 • Identify blanks, missing data, or irrelevant data. Step 4 • Alternatively, use “pivot tables” and “charts” to identify distribution and categorical counts. Select all data using ctrl+shift+arrow keys -> click on insert pivot tables -> new worksheet. Step 5 • Create a derived binary response (using “IF” function to return 0 or 1). Step 6 • Use “VLOOKUP” to replace blanks, missing or irrelevant data. Step 7 • Insert “combo clustered” 2-D chart using the data on pivot table to examine correlation of response between each feature. Step 8 • Remove features with high % of missing data.
  • 14.
    NUMERICAL FEATURES DISTRIBUTION& RECODING Some numerical (continuous or integers) features might be slightly correlated to the response, and thus it is important to identify the trends of these features and recode them as necessarily. Step 1 • Examine correlations of the continuous feature with response or using parametric (Student’s T-test) /non-parametric (Wilcoxon ranked) tests. Step 2 • Observe the histogram plot of the continuous feature with response by making a “combo clustered” or a “scattered plot” Step 3 • Identify highly correlated segments and recode feature Mean Age (target=0): 37 Mean Age (target=1): 44
  • 15.
    ADULT CENSUS INCOMEDATASET – EDA PRACTICE The cleaned data can be downloaded from https://goo.gl/qE7TPf (cleaned-adult.zip)
  • 16.
    EXPLORATORY DATA ANALYSIS– CORRELATION PLOT relationships Female Male Grand Total Husband 0.01% 99.99% 100.00% Wife 99.87% 0.13% 100.00% Not-in-family 46.66% 53.34% 100.00% Other-relative 43.83% 56.17% 100.00% Own-child 44.30% 55.70% 100.00% Unmarried 77.02% 22.98% 100.00% Grand Total 33.08% 66.92% 100.00% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% Husband Wife Not-in-family Other-relative Own-child Unmarried Relationship vs Gender Female (%) Male (%)
  • 17.
    EXPLORATORY DATA ANALYSISSUMMARY Executive Summary (What has happened?) 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 0 2000 4000 6000 8000 10000 12000 0 1 2 3 4 5 Age Group vs High Income counts high income (%) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 0 2000 4000 6000 8000 10000 12000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Education num vs High Income count high income (%) 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% 0 2000 4000 6000 8000 10000 12000 14000 Relationships vs High Income count high income (%) Overall High Income 24%
  • 18.
    EXPLORATORY DATA ANALYSISSUMMARY Executive Summary (What has happened?) Higher earned incomers tend to have a significantly higher capital gain/loss. Both this features might improve prediction modelling performance.
  • 19.
  • 20.
    MACHINE LEARNING ALGORITHMS– UNSUPERVISED • You do not know what you don’t have an idea • All data is unlabelled and the algorithms learn to inherent structure from the input data. • You only have input data (X) and no corresponding output variables.  How a fraudster does it?  When will it happen?  How to differentiate them?  Where are the anomalies? Telco Fraud People Management  Who is the top performer?  What are the metrics?  Who to award a promotion?  Where do they stand out? Product Cross/Up Sell  Who will need those products?  What is inside their shopping carts?  Which products to market?  How to package products?
  • 21.
    MACHINE LEARNING ALGORITHMS– UNSUPERVISED CLUSTERING Hierarchical Clustering K - Means Kernel Density Discriminant Analysis Isolation Forest One-Class SVM ASSOCIATIONS Apriori Eclat FP- Growth Context Based
  • 22.
    MACHINE LEARNING ALGORITHMS– SUPERVISED • You do not know what you knew • All data is labelled and the algorithms learn to predict the output from the input data • you have input variables (x) and an output variable (Y)  How a lead will convert?  What features or properties are important?  How to deal with leads with marginal probability? Leads Conversion Financing  Who is a good borrower?  Who will default on a loan?  Rules or pattern to differentiate them?  How to interpret probabilities of default? Property Sales  What is the best price?  What features affect sale price?  Do price affects sale probability?  Optimizing time, price, ability to close a sales?
  • 23.
    MACHINE LEARNING ALGORITHMS– SUPERVISED CLASSIFICATIONS REGRESSIONS - Decision Tree, Random Forest - eXtreme Gradient BOOSTing (XGBOOST) - Gradient Boosted Trees - Generalised Linear Model - Logistic Regression - Neural Networks - Support Vector Machine (SVM) - K Nearest Neighbour (KNN), K Means - eXtreme Gradient BOOSTing (XGBOOST) - Linear Gradient Boosted - Generalised Linear Model - Lasso, Ridge Regression - Elastic Net - Least Angle Regression (LARS) - Neural Networks
  • 24.
    TOOLS & RESOURCESCONSIDERATIONS • Near real time updates and monitoring. (e.g. Pricing Analysis, Recommendation Engine, Threat/Fraud Detection, Preventive Maintenance) • Periodic updates. (People Analysis, Marketing Response Prediction, Sales Forecast, Cancer/Disease Risk) • Predict-On-Demand. (Credit Risk/Scoring, Leads Conversion) • Storage: • Hadoop Distributed File System (HDFS), Traditional RDBMS, AWS Redshift, AWS RDS/S3 instance, HBase. • Architecture: • Apache Spark (Near Real Time Analytics) e.g. SparkR, PySpark, H2O. • HDInsights, HortonWorks, SpringXD • Computational: • Computational power – Number of CPU cores, GPUs, RAM memory
  • 25.
    ADULT CENSUS INCOMEPREDICTIONS 70% of the data are used for training a model Remaining 30% used as ‘hold-out’ samples for trained model’s prediction Predictions are generated from XGBoost algorithm, using Gradient Boosted Trees Training time: < 10 seconds on a Acer Inspire v 15 notebook, Intel Core i7, 12GB RAM 1000 iterations
  • 26.
  • 27.
    TYPES OF MLMODEL EVALUATION METRICS • Validating prediction model against known outcome/labels. • For “unsupervised” methods, model is evaluated only by the distance from the “known” clusters centroid. • RMSE (Root Means Square Error) • RMSLE (Root Means Square Logarithm Error) • MAE (Mean Absolute Error) • LogLoss (Logarithmic Loss) • MAP@n (Mean Average Precision @n Classes) • MLogLoss (Multi Class Logarithmic Loss) • Hamming Loss • AUC (Area Under ROC Curve) • Most commonly used evaluation for binary classifications prediction models • Range: 0.5 ~ 1.0  Measure how close the forecasts or predictions are to the eventual outcomes.  More suited to regressions models.  Range (0 - ∞)  More suited to classification models.  Range (0 - ∞)
  • 28.
    BINARY CLASSIFICATION MODELEVALUATION • Gini Lift and Decile Charts • Ranking predictions and examine how much ‘lift’ does the model provide (NULL model). • Kolmogorov Smirnov Chart • Examine how well the model differentiate between 2 classes. • Confusion Matrix • Commonly used by medical domain to assess sensitivity vs specificity of tests
  • 29.
    AREA UNDER ROCCURVE Probability >= 0.5, Predict response as positive else, negative Confusion Matrix Target Positive Negative Model Positive 1539 368 Positive Pred Rate 0.8070 Negative 839 7022 Negative Pred Rate 0.8933 Sensitivity Specificity 87.643% 0.6472 0.9502 Sensitivity = 64% 1-Specificity = 5% Sensitivity = True Positive Rate 1-Specificity = False Positive Rate
  • 30.
  • 31.
    THE BIG PICTURE– PUTTING IT TOGETHER 0 100 200 300 400 500 600 700 800 900 1000 17 22 27 32 37 42 47 52 57 62 67 72 77 82 87 22 27 32 37 42 47 52 57 62 67 72 77 83 0 1 Age vs Income Total Mean Age (target=0): 37 Mean Age (target=1): 44
  • 32.
    USING COMBINATION OFCHARTS 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 0 500 1000 1500 2000 2500 3000 114 1055 1409 1639 2036 2176 2346 2463 2653 2961 3137 3432 3781 4064 4650 4934 5556 6497 7298 7978 10566 14344 20051 34095 Capital Gain vs High Income (%) Count High Income (%)
  • 33.
    MAXIMIZING ROI ONMARKETING RESPONSE • Assumptions: 1. Average loan amount $10,000 2. Interest return at 10% 3. Default rate at 5% 4. Marketing costs 20% of average revenue 5. Simple mechanics of how financing works
  • 34.
  • 35.
    THANK YOU ECHELON ASIASUMMIT 2017 Garrett Teoh Hor Keong Chief Data Officer, Renotalk Pte Ltd LinkedIn: garrettteoh Email: rtgteoh@renotalk.com