Introduction to Machine Learning - WeCloudData

Introduction to
Machine Learning
WeCloudData
@WeCloudData @WeCloudData tordatascience
weclouddata
WeCloudData tordatascience

Career
Services
Meetup
Events
Introduction
Data Skills
Training
WeCloudData offers Toronto’s first data
science accelerator program. We specialize
in teaching lead-edge tools such as AWS,
Spark, and Machine Learning and help our
corporate clients upskill/reskill their data
teams

WCD works with some of the most
talented and experienced data
science experts to deliver public
and corporate trainings. We
currently have 21 part-time and 2
full-time instructors.
Our instructors bring their analytical
expertise from various industries,
teach students advanced tools
such as Python, Hadoop, Spark,
and AWS, mentor students on end-
to-end data projects.
Introduction
Faculty Team
21
Instructors
10
Teaching
Assistants

Python for SAS
and SQL Users
Machine
Learning |
Deep Learning
Big Data
Executive
Workshops
Product & Services
Corporate Training
We offer customized corporate training to Canadian
companies with flexible schedules and learning
support!
We help train, upskill, and
reskill data teams!

Python for SAS Users
Machine Learning
Big Data
AI/DS for Executives
Corporate Data Programs
We’ve delivered customized trainings to many large Canadian companies
WeCloudData
Corporate
Program
We offer customized corporate training to Canadian
companies with flexible schedules and learning support!
We help train, upskill,
and reskill data teams!

Introduction
Communities we’re building
8,000 members
120 events
We organize one of the most active DS
communities in Canada!

Upcoming Events
Schedule
Track
Meetup
Org
Topic Date
Data Science WCD Introduction to Machine Learning May 29
Big Data WCD Big Data for Data Scientist – Open Class Jun 4
Big Data WCD Spark on Kubernetes Jun 5
Big Data Lightbend Kafka in Jail with Strimzi Jun 11
Cloud
Big Data & AI
Conference
Machine Learning from
Experimentation to Production on AWS
Jun 12
Big Data
Big Data & AI
Conference
Transforming big data from On-premise
to the Cloud
Jun 12
Data Science
Big Data & AI
Conference
Spark for Data Science Jun 13
Data Science
Big Data & AI
Conference
Moving Towards a Python Environment Jun 13
tordatascience

Workshop Provider
Conference/Clients
Workshop Provider
TMLS Conference
November, 2018
Workshop Provider
TD Canada
Analytics Month
October, 2018
• Machine Learning Open Data
• Spark ML and MLflow
• Deep Learning with PyTorch
• Python for SAS Users
• Machine Learning with Python
Workshop Provider
Big Data & AI
Toronto 2019
June, 2019
• Big Data in AWS Cloud
• Spark for Data Science
• Moving from On-Prem to Cloud
WeCloudData is the conference workshop choice of vendors in Toronto due to our expertise and specialty.

Analytics Events
We help companies with hiring/branding events
WeCloudData organizes one of the
largest and most active data science
communities in Toronto with 7,500
members and 110 past events. We help
companies facilitate mini-conferences
and help them run hiring events.

2005
2007
2008 2010
2011
2015
2012
2014 2016 2018
Instructor
Shaohua Zhang
• Co-founder and CEO of WeCloudData. Lead instructor for the corporate training program
• Certified SAS Predictive Modeler since 2007 (among the first 20 in the world)
• Helped build and lead the data science team at BlackBerry (2010 – 2015)
• Helping Communitech incubator and Open Data Exchange mentor startups on data strategies
• Specializes in machine learning, big data, and cloud computing

Learning Path
Data Science Program
Prerequisites
Data Science
Learning Path
Learn to build ML
models using
Sklearn
ML Applied
Master data
wrangling with
Python
Data Science
w/ Python
Harness big data
with Hadoop, Hive,
Presto, and
AtScale
Big Data
Build your portfolio
with hands-on
Capstone projects
ML Advanced
Machine Learning
at Scale with
PySpark ML and
Real-time
Deployment
Spark
Contact us about the courses:
• info@weclouddata.com
Upcoming courses:
• https://weclouddata.com/upcoming-course-schedule

Data Jobs in the
MarketData Handling Complex Analytics Big Data Storytelling
Data Science
Data Scientist

Coding/Tools
Math/ML Storytelling
Data
Scientist
Linux
Python/Scala/Java
Cloud (AWS)
Hadoop, Spark
Statistics
Linear Algebra
Regression
Classification
Clustering
NLP
Presentation
Use cases
Project Mgmt
Communications
Data Science
Essential Skills

Data
Application
Scraping/API
Labeled data
Infra/
Platform
RDBMS
Hadoop
Cloud
Data Engineering
ETL
Enrichment
Dataflow
automation
AI/ML
Python
ML
Deployment
Prediction API
Stream
processing
Data Science
The Myth

Data Scientist
The Types
Operational DS
Focus: data wrangling, work with
large/small messy data, builds
predictive models
Strength: data handling, tools, business
knowledge
ML Engineer
Focus: ML model deployment, data
pipelines
Strength: coding, algorithms, machine
learning, platforms and tools
ML Researcher
Focus: algorithm development,
research, IP
Strength: ML/DL algorithms,
implmentation, research
DS Product Mngr
Focus: product strategy, business
communications, project management
Strength: product sense, business
requirements, DS acumen

Predictive
Modeler
GrowthAcquisition Maturity Decline Loss
● Lead Gen
● Digital Mktg
● Mobile Ads
● Cross/Up-sell
● Segmentation
● CLTV
● Taste graph
● Personalization
● Loyalty Management
● Context-based Mkgt
● Churn models
● Retention
Acquisition
Models
LTV Loyalty
Management
Retention Winback
Customer
Value
● Winback
models
Predict high risk customers

Twitter API
Data
Scientist
Business
Our new product feature received a lot of negative review..
- Can we do some analysis?

Introduction to Machine
Learning

Machine Learning
Height (in)
Weight(lbs)
Humans

Machine Learning
download speed (Mb/s)
uploadspeed(Mb/s)
Internet Providers

Machine Learning
Sepal Length
PetalLength
Iris Flowers

MACHINE
LEARNING
CONTINUOUS CATEGORICAL
SUPERVISED REGRESSION CLASSIFICATION
UNSUPERVISED
DIMENSION
REDUCTION
CLUSTERING
Types of Machine Learning Algorithms

Data Acquisition &
Preparation

Data Acquisition
• Behavioral data
• Scraped data
• 3rd party data
• Labeled data

Modeling Dataset
Credit
Approval
Age Gender
Annual
Salary
Months in
Residence
Months
in Job
Current
Debt
Paid off
Credit
Client 1 23 M $30,000 36 12 $5,000 Yes
Client 2 30 F $45,000 12 12 $1,000 Yes
Client 3 19 M $15,000 3 1 $10,000 No
Client 4 25 M $25,000 12 27 $15,000 ?
Features (Predictors | Input
Variables)
Labels
(Target)
ID
(Index)

Train Validation Testing
Dataset
Holdout Approach

Dataset
Cross-validation Approach
TestingTrain
Cross-
validation

Feature Processing
(Engineering)

Feature Preprocessing
● Derived features
● Feature scaling
● Variable binning
● One-hot encoding
● Weight of evidence
● TF-IDF
● etc.

Cross-validation
Dataset
TestingTrain
Performance 1
Performance 2
Performance 3
Performance 4
Performance 5
Performance avg

Train
SVM
Dataset
Train Test
Train
RandomForest
Test
Test
Cross-validation
Winning
Model
Winning Parameter
Set

Linear Classification Model Interpretation – Logistic Regression

Non-linear Classification Model Interpretation – Decision Tree

Complex Model Interpretation – Surrogate Model

Complex Model Interpretation – Feature Importance
Feature Importance plots are quite common for explaining the models. But it’s not ideal. For
instance, it doesn’t get any indication of the direction of the relationship, whether it’s linear
or non-linear.

Complex Model Interpretation – LIME
Lime is short for Local Interpretable Model-Agnostic Explanations. Each part of the name reflects
something that we desire in explanations. Local refers to local fidelity - i.e., we want the explanation to
really reflect the behaviour of the classifier "around" the instance being predicted.This explanation is
useless unless it is interpretable - that is, unless a human can make sense of it. Lime is able to explain
any model without needing to 'peak' into it, so it is model-agnostic.
All previously mentioned methods can give an idea about the global behavior of
the model. They fail to tell why a particular instance is classified one way or the
other.
1. Perturb the observation
2. Calculate distance between permuted data and
original observations
3. Make predictions on the permuted data using
complex model
4. Pick m features best describing the complex model
outcome from the permuted data
5. Fit a simple model to the permuted data with m
features and similarity scores as weights
6. Feature weights from the simple model make
explanations for the complex models local behavior

Applied Machine Learning
Course Detail

Syllabus
Syllabus (Weekend Cohort – 12 sessions/48 hours)
Lecture Content Lecture Content
1
Introduction
• Introduction to Machine Learning
• Gradient Descent
7
Advanced Ensembles
• Xgboost
• Stacking
2
Regularization
• Regularization
• Lasso/Ridge/ElasticNet
8
Model Interpretation
• Factorization Machines
• Complex Model Interpretation
3
Logistic Regression
• Logistic Regression
• Multi-class Classification
• Evaluation Metrics
• Variance/Bias Tradeoff
9
Unsupervised
• K-Means Clustering
• Dimension Reduction
• PCA
4
Feature
Engineering
• Numerical Features
• Categorical Features
• Text Features
10
Neural Networks I
• Neural Networks
• Backpropagation
5
Non-parametric
Models
• KNN
• Decision Trees
• Project kick-off
11
Recommendation
Engines
• Market Basket Analysis
• Collaborative Filtering
• Matrix Factorization
6
Parameter Tunings
• Ensemble Methods
• Bagging
• Boosting
• Hyper-parameter Tunings
12
Model Deployment
• Machine Learning Lifecycle
• Model Deployment
• Project Presentation

Instructor – Jodie Zhu
• Machine Learning Engineer at Dessa
• University of Toronto, Master of Science (Biostatistics)
• Python Instructor at WeCloudData
• Career development mentor
• Expertise: Python | Data Science | Deep Learning
Machine Learning Engineer
Dessa

Python Programming
Instructor – Holly Xie
• Machine Learning Scientist at integrate.ai
• University of Waterloo, Master of Mathematics
• Machine Learning Instructor at WeCloudData
• Expertise: Machine Learning| Deep Learning
Machine Learning Scientist
Integrate.ai

Hands-on Project
This course is instructor-led and project-based. Students will be able to apply the Machine
Learning knowledge acquired in the course to a hands-on project.
Project:
• The instructor will work with the students to decide the project topics. It is highly
recommended that the students bring their own motivation and ideas. Otherwise, a
topic along with datasets will be assigned to the students
• The student is also encouraged to apply the learnings directly to his/her company’s
data problems and receive technical advice from the instructor

Interview Practice
For job seekers, this course also
provides supplementary materials to
help you prepare for data science
interviews
Interview Help
• Common ML interview questions
• Mock interview quiz

Price
Course Pricing
Applied Machine Learning $2000 + tax

Upcoming WeCloud Events
Event Schedules

Introduction to Machine Learning - WeCloudData

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Machine Learning - WeCloudData

Similar to Introduction to Machine Learning - WeCloudData (20)

More from WeCloudData

More from WeCloudData (15)

Recently uploaded

Recently uploaded (20)

Introduction to Machine Learning - WeCloudData