Machine learning

MACHINE LEARNING
Evren Korpeoglu, Data Science
Aarthi Srinivasan, Product Management
/Productschool @ProdSchool /ProductmanagementSV

What to Expect?
2
• What is machine learning ?
• Why is it important ?
• How do we use it ?
• Technical Concepts
• Examples

What is Machine Learning?
3
1. Science of getting computers to learn or recognize something without being explicitly
programmed – Andrew Ng
• Branch of Artificial Intelligence which is a branch of Computer Science
• Give lots of data to the computer so that it can figure it out
• One of the first examples is the computer checkers program by Arthur Samuel
* - ref: Andew Ng Courses, Big data: A revolution
2. Distinguish big data & machine learning: Big data is the data seed for creating
machine learning forests
• Big data collects information based on our digital exhaust (crumbs we leave in
the digital world) , demographics, preferences, health etc.
• Machine learning will mine this data and model behaviors with interactive
responses based on this data

Why do we need this?
4
1. Tons of applications impacting human health, utility and future simplification
Health & Wellness Utilitarian Future
• DNA sampling &
diagnosis
• Health reminders
& prevention
through AI tools
• Correlation studies
• Customizable
tablets
• Real time optimized
path maps
• Search Ranking
• Spam filter on email
• News aggregators
• Shopping
Recommendation
• Facebook face
recognition
• Age recognition
• Voice recognition –
Siri, Alexa
• Driverless cars
• Home decoration

Key Terms
5
• A set of data used to predict relationships.
• E.g. A diamond’s size, cut and clarity helps predicts the price. Data and
answers for each sample.
Training Set
• Uses training set to make a prediction.
• E.g. Model predicts diamond prices based on past prices.
Supervised Learning
• Provide data without suggesting anything so computer can identify patterns
or groupings.
• E.g. Customer segmentation, DNA groupings.
Unsupervised Learning
• Each distinct measurable data value you select in the training data set.
• E.g. A diamonds’ size is one of the feature’s for predicting price.
Features/ Variables /
Attributes
• Using the features provided in the training set make a prediction. Fit a curve
using the data provided.
• E.g. Price of diamond = X*Cut + Y*Clarity + Z*Size + features…
Supervised: Regression
• A defined set of categories for placing new data (observations)
• E.g. Presence of absence of cancer; Types of diabetes
Supervised: Classification
• Process of assigning observations into subsets
• E.g. Customer segment creations
Unsupervised: Clustering

Learning Steps
6
Collect /
Update User
Data
1
Create /
Update
Training Set
data
2
Create /
Update
algorithm for
training data
Update
Algorithm
Validate
Algorithm
3
Create
predictive
model
4
New real-time
observations
A/B Test &
Launch on
production
5

Data Wrangling and Feature Extraction
7
Spam Email
Detection
Title
Sender
Domain
# of
Recipients
Email
content
Country of
Origin
Non-
dictionary
Words
Hyperlinks
Address
Book
Length of
email
• Structured Data (Best)
– RDBMS, columnar data
– Strict Schema
– SQL
• Semi-Structured Data (Better)
– JSON, XML
– Enforce minimum schema
– JSON, XML Parser
• Unstructured Data
– Text, Image, Raw email
– No Schema
– Batch processing
– Regular expressions
– Map Reduce
GARBAGE IN GARBAGE OUT

Model Training
8
Feature
Extraction
(Feature
vector)
New
Text documents
User Activity
Images
Transaction history
Feature
Extraction
(Feature
vector)
Labels
Machine
Learning
Algorithm
Training / Testing
Text documents
User Activity
Images
Transaction history
Predictive
Model
Expected
Label
Model
Evaluation

Supervised learning techniques
9
• Linear classifier (numerical functions)
• Parametric (Probabilistic functions)
– Naïve Bayes, Hidden Markov models (HMM), Probabilistic
graphical models
• Non-parametric (Instance-based functions)
– K-nearest neighbors
• Non-metric (Symbolic functions)
– Classification and regression tree (CART)
• Aggregation
– Bagging (bootstrap + aggregation), Adaboost, Random
forest, Ensemble models

Linear Classifiers
10
• Logistic regression
– )
– w with minimum loss
– Solve iteratively using gradient descent
• Support vector machine (SVM)
– Maximum margin classifier
• Artificial Neural Networks
– Inspired from how neurons work
– Activation function (Sigmoid, ReLU etc.)
– Deep Learning

KNN / CART
11
• K-Nearest Neighbors
– Find K nearest training examples
– Majority vote
– Easy to implement
– Not scalable for real time predictions
• Classification and Regression Trees
– Easy to interpret for small trees
• Random Forests
– Ensemble of decision trees
– Usually performs very good

Unsupervised Learning
12
• Clustering
– K-means clustering
– Spectral clustering
• Dimensionality reduction
– Principal component analysis (PCA)
– Factor analysis
• Product Recommendations
– Collaborative Filtering
• Association Rules
– Market Basket Analysis

Model Evaluation
13
• Measure model performance
• Optimize model to improve prediction
quality
– Feature selection
– Hyperparameter tuning
• A/B Testing
• Explore/Exploit
• http://en.wikipedia.org/wiki/Precision_and_recall

Sample Architecture
14
-HADOOP
- SPARK
PREDICTION ENGINE
REAL TIME
DATA
SQL / NO SQL
Data Base
CLIENT MACHINE LEARNING
SYSTEM

Health & Wellness Sen.se Mother (iOT)
15

Amazon Echo & Personalization
16

Houzz Visual Match Deep Learning
17

Machine learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine learning

Similar to Machine learning (20)

More from Aarthi Srinivasan

More from Aarthi Srinivasan (8)

Recently uploaded

Recently uploaded (20)

Machine learning