SlideShare a Scribd company logo
1 of 28
Learning Predictive Modeling with
TSA and Kaggle: Tips for Beginners
Yvonne K. Matos
ChiPy Data Science SIG Meeting, November 15, 2017
Photo: Benoit Tessier/Reuters
Where to start with deep learning?
Activation functions
Back Propogation
Neural Networks
Output layers
Weights
Recurrent Neural Networks
Sigmoid Functions
Loss Functions
Decision Trees
Define goals, pick a project, dive in!
1. Work with large datasets and cloud computing
2. Develop deep learning algorithms
3. Increase experience with Python
4. Get hired as a data scientist!
Data Science Pipeline
Data science is 80% cleaning and preprocessing, 20% modeling
Business
Question
Data
Question
Data
Collection
Data
Loading
Data
Cleaning
PreprocessingModelingValidation
Data
Answer
Business
Answer
Exploratory
Analysis
My Goals, ChiPy Mentorship Program
TSA Passenger Algorithm Screening Challenge
Problem: High false alarm rates create bottlenecks at airport checkpoints.
Challenge: Create an algorithm with a lower false alarm rate using a dataset of scan
images with simulated threats
Business
Question
Data
Question
Data
Collection
Data
Loading
Data
Cleaning
PreprocessingModelingValidation
Data
Answer
Business
Answer
Exploratory
Analysis
Exploration: Visualizing the Images
(n, 512, 660, 16)
Raw data for lowest res 10 MB image: 4D array
• 3D images
• 3TB dataset
(n, 128, 128, 128, 1)
Other 3D images: 5D array
(n, 512, 512, 660)
Higher res 330 MB image:
TSA 3D images vs 2D RGB images
=
If I fits,
I sits!
(n, 512, 660, 3)
Samples
Dimensions
Channels
(n, 512, 660, 16)
Samples
Dimensions
Channels
3 Channels
16 Channels
Anticipated Challenges
• First Python project
• 10 MB per low res file = long run times
• Enormous scope
– Full training in cloud = $$$$$
Plan of attack
• Begin small
• Scale up locally
• Run in the cloud
Start small with a data subset
19, 499 potential threats from 17 body zones
Zone 6
1,148 total images
• 1,032 non-threat
• 116 threat (10%)
17,628 non-threat
1,871 threat (9.6%)
Lowest res images:
~10 MB each
Start with 120 images
Image Preprocessing: Getting x and y data
120 samples: 102 non threat, 18 threat
X data
Y data
z6samlist = os.listdir('/Users/Yvonne/Desktop/TSA_Kaggle/Z6_n30_9.18.17')
z6paths = ['/Users/Yvonne/Desktop/TSA_Kaggle/Z6_n30_9.18.17/' + z6sam for z6sam in z6samlist]
del z6paths[0]
arr_list = [read_data(z6path) for z6path in z6paths]
x = np.stack(arr_list, axis=0)
maximum = np.max(x)
minimum = np.min(x)
x = (x - minimum)/(maximum - minimum)
X shape: (120, 512, 660, 16)
X size: 2.6 GB
y = z6sample_120.iloc[:,2].values
Y shape: (120,)
X Scaling
Neural Networks Attempt to Model
the Human Brain
X = independent variables
for each observation
Scaling X is a must!
W = weights
Input Layer
Hidden Layer
Output Layer
Output Value
x1
x2
x3
x4
ŷ
y
w1
w2
Σ
Σ
Σ
Σ
Σ
Σ
Goal: minimize C
Slide concept credit: SuperDataScience
ŷ = predicted value
y = actual value
Neural Networks Learn Through Backpropagation
XIndex
0
1
2
7
y
0
0
0
1
C
Adjust w1, w2
...
...
0 1 2 ... 7
Slide concept credit: SuperDataScience
Additional Info on Neural Networks:
Getting Started
Udemy Course
Deep Learning A-Z™: Hands-On
Artificial Neural Networks
YouTube
https://adeshpande3.github.io/adeshpande3.github.io/
Blog
Online Book
http://neuralnetworksanddeeplearning.com/index.html
Challenges: Getting Ready to Develop
First Model
No GPU support on mac for TF
• Attempted Solution:
– Build TF from unsupported version compatible
with OpenCL
• Lesson:
– Create a separate environment
+
Building the first model
One line of code = one layer
from keras.models import Sequential
from keras.layers import Dense, Flatten
classifier = Sequential()
classifier.add(Dense(25, input_shape=(512, 660, 16),
activation='relu', kernel_initializer='uniform'))
classifier.add(Flatten())
classifier.add(Dense(25, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dense(1, activation='sigmoid',
kernel_initializer='uniform'))
classifier.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
classifier.fit(x_train, y_train, batch_size=10, epochs=50)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
*5 *5 *5 *5 *5
*5*5 *5 *5 *5
96 9624 24Samples:
First Model Learns on Training Set
XIndex
0
1
2
95
y
0
0
0
1
C
Adjust w1, w2
...
...
ŷy
0 1 2 … 95
First Model Validation on Test Set
Challenge: Jupyter notebook disconnects mid-run
• Alternative
– Do long runs in .py file
y_pred = classifier.predict(x_test)
y_pred = (y_pred > 0.5)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
Overfitting an issue!
22 0
2 0
Predicted
Actual
No
Yes
No Yes
False negatives
False positives
=
=
Some Ways to Tune a Model, Address Overfitting
Increase or Decrease Epochs and batch size
Add Dropout layers
Add Hidden Layers and Increase Nodes
Test Different
Activation Functions
Tuning with GridSearchCV
Grid of all possible combinations
side = 1 parameterEach
Tuning with GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout
def build_classifier(optimizer):
classifier = Sequential()
classifier.add(Dense(25, input_shape=(512, 660, 16),
activation='relu', kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.1))
classifier.add(Flatten())
classifier.add(Dense(50, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.1))
classifier.add(Dense(50, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.1))
classifier.add(Dense(50, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.1))
classifier.add(Dense(1, activation=’sigmoid',
kernel_initializer='uniform'))
classifier.compile(optimizer=optimizer,loss='binary_crossentropy',
metrics=['accuracy'])
return classifier
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
classifier = KerasClassifier(build_fn=build_classifier)
parameters = {'batch_size': [25, 32],
'nb_epoch': [50, 100],
'optimizer': ['adam', 'rmsprop’ ]}
grid_search = GridSearchCV(estimator=classifier,
param_grid=parameters,
scoring='accuracy',
cv=10)
grid_search = grid_search.fit(x_train, y_train)
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_
print(best_parameters)
print(best_accuracy)
Defining Model Architecture Defining Parameters for GridSearchCV
Challenge: Code Terminates
Memory usage too high with GridSearchCV
Parameters 1
Parameters 2
Parameters 3
Parameters 4
print(best_parameters)
{'batch_size': 25, 'nb_epoch': 100, 'optimizer': 'adam'}
print(best_accuracy)
0.79600000000000004
Alternative: Take iterative approach instead
for key, value in parameters.items():
build_classifier.fit(x_train, y_train)
print()
Scaling Up From 120 to ½ Samples in Zone 6 573
Best model
Confusion
matrix
from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout
classifier = Sequential()
classifier.add(Dense(25, input_shape=(512, 660, 16),
activation='relu', kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.2))
classifier.add(Flatten())
classifier.add(Dense(50, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.2))
classifier.add(Dense(50, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.2))
classifier.add(Dense(50, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.2))
classifier.add(Dense(50, activation='relu',
kernel_initializer='uniform'))
classifier.add(Dropout(rate=0.2))
classifier.add(Dense(1, activation='sigmoid',
kernel_initializer='uniform'))
classifier.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
classifier.fit(x_train, y_train, batch_size=25, epochs=50)
*5 *5 *5 *5 *5
*5 *5 *5 *5 *5 *5 *5 *5 *5 *5
*5 *5 *5
*5 *5 *5 *5 *5 *5 *5*5 *5
*5*5*5*5*5*5*5
*5
*5*5*5*5*5*5*5*5*5*5
Challenges Scaling up
Alternative:
• Use online learning model, iterate through
each image in a partial fit model
Next step: Scale up to entire Zone 6 sample 1147
arr_list = [read_data(z6path) for z6path in z6paths]
x = np.stack(arr_list, axis=0)
X size = ~24.8 GB
TOO BIG!
Working
with
Can take awhile to connect
• Good tutorial:
Limited free credits ($300)
• Rate depends on power & region
– Estimate run cost
• Plan: use all credits for 1-2 runs
http://cs231n.github.io/gce-tutorial/
Working model to date
• High rate of identifying non-threats
• Low rate of false positive threat ID
BUT…
• Also has high rate of false negatives
=
=
TSA’s current algorithm also has a high
false negative rate
What’s Next?
One month to go
• Reduce false negative rate
• Run in Google Cloud Platform
• Productionalize model
Key Takeaways of Challenging Projects
• Big data challenges even with few samples
• Flexibility with project scope
• Don’t be intimidated
• Lots can be learned in a short time!
Thanks!
• Thanks for coming tonight
• ChiPy mentorship program
• Trunk Club for hosting

More Related Content

What's hot

Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNetAI Frontiers
 
News from Mahout
News from MahoutNews from Mahout
News from MahoutTed Dunning
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Julian Lee
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceLviv Startup Club
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Landmark Retrieval & Recognition
Landmark Retrieval & RecognitionLandmark Retrieval & Recognition
Landmark Retrieval & Recognitionkenluck2001
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningMax Kleiner
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOWMark Chang
 
Salt Identification Challenge
Salt Identification ChallengeSalt Identification Challenge
Salt Identification Challengekenluck2001
 
VLDB_2015_Nurjahan Begum
VLDB_2015_Nurjahan BegumVLDB_2015_Nurjahan Begum
VLDB_2015_Nurjahan BegumNurjahan Begum
 
Baseball Prediction Model on Tensorflow
Baseball Prediction Model on TensorflowBaseball Prediction Model on Tensorflow
Baseball Prediction Model on TensorflowJay Ryu
 
Time Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal RecoveryTime Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal RecoveryDaniel Cuneo
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMatthias Zimmermann
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvasdeanhudson
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANWuhyun Rico Shin
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 

What's hot (20)

Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
 
News from Mahout
News from MahoutNews from Mahout
News from Mahout
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Landmark Retrieval & Recognition
Landmark Retrieval & RecognitionLandmark Retrieval & Recognition
Landmark Retrieval & Recognition
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOW
 
Xgboost
XgboostXgboost
Xgboost
 
Salt Identification Challenge
Salt Identification ChallengeSalt Identification Challenge
Salt Identification Challenge
 
VLDB_2015_Nurjahan Begum
VLDB_2015_Nurjahan BegumVLDB_2015_Nurjahan Begum
VLDB_2015_Nurjahan Begum
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
 
Baseball Prediction Model on Tensorflow
Baseball Prediction Model on TensorflowBaseball Prediction Model on Tensorflow
Baseball Prediction Model on Tensorflow
 
Time Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal RecoveryTime Series Analysis:Basic Stochastic Signal Recovery
Time Series Analysis:Basic Stochastic Signal Recovery
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle Introduction
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvas
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GAN
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 

Similar to Learning Predictive Modeling with TSA and Kaggle

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...Dataconomy Media
 
Building a Better TSA Screening Algorithm
Building a Better TSA Screening AlgorithmBuilding a Better TSA Screening Algorithm
Building a Better TSA Screening AlgorithmYvonne K. Matos
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using pythonLino Coria
 
Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0Rajagopal A
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoDatabricks
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learningMax Kleiner
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkitde:code 2017
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnDebarko De
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfssuserb4d806
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018 Codemotion
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!Dhiana Deva
 
Machine Learning and Go. Go!
Machine Learning and Go. Go!Machine Learning and Go. Go!
Machine Learning and Go. Go!Diana Ortega
 
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...Databricks
 

Similar to Learning Predictive Modeling with TSA and Kaggle (20)

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
 
Building a Better TSA Screening Algorithm
Building a Better TSA Screening AlgorithmBuilding a Better TSA Screening Algorithm
Building a Better TSA Screening Algorithm
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using python
 
Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0Session 4 start coding Tensorflow 2.0
Session 4 start coding Tensorflow 2.0
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei Diao
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
 
Captcha
CaptchaCaptcha
Captcha
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
 
Machine Learning and Go. Go!
Machine Learning and Go. Go!Machine Learning and Go. Go!
Machine Learning and Go. Go!
 
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
A Distributed Deep Learning Approach for the Mitosis Detection from Big Medic...
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Learning Predictive Modeling with TSA and Kaggle

  • 1. Learning Predictive Modeling with TSA and Kaggle: Tips for Beginners Yvonne K. Matos ChiPy Data Science SIG Meeting, November 15, 2017 Photo: Benoit Tessier/Reuters
  • 2. Where to start with deep learning? Activation functions Back Propogation Neural Networks Output layers Weights Recurrent Neural Networks Sigmoid Functions Loss Functions Decision Trees
  • 3. Define goals, pick a project, dive in! 1. Work with large datasets and cloud computing 2. Develop deep learning algorithms 3. Increase experience with Python 4. Get hired as a data scientist! Data Science Pipeline Data science is 80% cleaning and preprocessing, 20% modeling Business Question Data Question Data Collection Data Loading Data Cleaning PreprocessingModelingValidation Data Answer Business Answer Exploratory Analysis My Goals, ChiPy Mentorship Program
  • 4. TSA Passenger Algorithm Screening Challenge Problem: High false alarm rates create bottlenecks at airport checkpoints. Challenge: Create an algorithm with a lower false alarm rate using a dataset of scan images with simulated threats Business Question Data Question Data Collection Data Loading Data Cleaning PreprocessingModelingValidation Data Answer Business Answer Exploratory Analysis
  • 5. Exploration: Visualizing the Images (n, 512, 660, 16) Raw data for lowest res 10 MB image: 4D array • 3D images • 3TB dataset (n, 128, 128, 128, 1) Other 3D images: 5D array (n, 512, 512, 660) Higher res 330 MB image:
  • 6. TSA 3D images vs 2D RGB images = If I fits, I sits! (n, 512, 660, 3) Samples Dimensions Channels (n, 512, 660, 16) Samples Dimensions Channels 3 Channels 16 Channels
  • 7. Anticipated Challenges • First Python project • 10 MB per low res file = long run times • Enormous scope – Full training in cloud = $$$$$ Plan of attack • Begin small • Scale up locally • Run in the cloud
  • 8. Start small with a data subset 19, 499 potential threats from 17 body zones Zone 6 1,148 total images • 1,032 non-threat • 116 threat (10%) 17,628 non-threat 1,871 threat (9.6%) Lowest res images: ~10 MB each Start with 120 images
  • 9. Image Preprocessing: Getting x and y data 120 samples: 102 non threat, 18 threat X data Y data z6samlist = os.listdir('/Users/Yvonne/Desktop/TSA_Kaggle/Z6_n30_9.18.17') z6paths = ['/Users/Yvonne/Desktop/TSA_Kaggle/Z6_n30_9.18.17/' + z6sam for z6sam in z6samlist] del z6paths[0] arr_list = [read_data(z6path) for z6path in z6paths] x = np.stack(arr_list, axis=0) maximum = np.max(x) minimum = np.min(x) x = (x - minimum)/(maximum - minimum) X shape: (120, 512, 660, 16) X size: 2.6 GB y = z6sample_120.iloc[:,2].values Y shape: (120,) X Scaling
  • 10. Neural Networks Attempt to Model the Human Brain X = independent variables for each observation Scaling X is a must! W = weights Input Layer Hidden Layer Output Layer Output Value x1 x2 x3 x4 ŷ y w1 w2 Σ Σ Σ Σ Σ Σ Goal: minimize C Slide concept credit: SuperDataScience ŷ = predicted value y = actual value
  • 11. Neural Networks Learn Through Backpropagation XIndex 0 1 2 7 y 0 0 0 1 C Adjust w1, w2 ... ... 0 1 2 ... 7 Slide concept credit: SuperDataScience
  • 12. Additional Info on Neural Networks: Getting Started Udemy Course Deep Learning A-Z™: Hands-On Artificial Neural Networks YouTube https://adeshpande3.github.io/adeshpande3.github.io/ Blog Online Book http://neuralnetworksanddeeplearning.com/index.html
  • 13. Challenges: Getting Ready to Develop First Model No GPU support on mac for TF • Attempted Solution: – Build TF from unsupported version compatible with OpenCL • Lesson: – Create a separate environment +
  • 14. Building the first model One line of code = one layer from keras.models import Sequential from keras.layers import Dense, Flatten classifier = Sequential() classifier.add(Dense(25, input_shape=(512, 660, 16), activation='relu', kernel_initializer='uniform')) classifier.add(Flatten()) classifier.add(Dense(25, activation='relu', kernel_initializer='uniform')) classifier.add(Dense(1, activation='sigmoid', kernel_initializer='uniform')) classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) classifier.fit(x_train, y_train, batch_size=10, epochs=50) from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2) *5 *5 *5 *5 *5 *5*5 *5 *5 *5 96 9624 24Samples:
  • 15. First Model Learns on Training Set XIndex 0 1 2 95 y 0 0 0 1 C Adjust w1, w2 ... ... ŷy 0 1 2 … 95
  • 16. First Model Validation on Test Set Challenge: Jupyter notebook disconnects mid-run • Alternative – Do long runs in .py file y_pred = classifier.predict(x_test) y_pred = (y_pred > 0.5) from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) print(cm) Overfitting an issue! 22 0 2 0 Predicted Actual No Yes No Yes False negatives False positives = =
  • 17. Some Ways to Tune a Model, Address Overfitting Increase or Decrease Epochs and batch size Add Dropout layers Add Hidden Layers and Increase Nodes Test Different Activation Functions
  • 18. Tuning with GridSearchCV Grid of all possible combinations side = 1 parameterEach
  • 19. Tuning with GridSearchCV from keras.models import Sequential from keras.layers import Dense, Flatten, Dropout def build_classifier(optimizer): classifier = Sequential() classifier.add(Dense(25, input_shape=(512, 660, 16), activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.1)) classifier.add(Flatten()) classifier.add(Dense(50, activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.1)) classifier.add(Dense(50, activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.1)) classifier.add(Dense(50, activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.1)) classifier.add(Dense(1, activation=’sigmoid', kernel_initializer='uniform')) classifier.compile(optimizer=optimizer,loss='binary_crossentropy', metrics=['accuracy']) return classifier from keras.wrappers.scikit_learn import KerasClassifier from sklearn.model_selection import GridSearchCV classifier = KerasClassifier(build_fn=build_classifier) parameters = {'batch_size': [25, 32], 'nb_epoch': [50, 100], 'optimizer': ['adam', 'rmsprop’ ]} grid_search = GridSearchCV(estimator=classifier, param_grid=parameters, scoring='accuracy', cv=10) grid_search = grid_search.fit(x_train, y_train) best_parameters = grid_search.best_params_ best_accuracy = grid_search.best_score_ print(best_parameters) print(best_accuracy) Defining Model Architecture Defining Parameters for GridSearchCV
  • 20. Challenge: Code Terminates Memory usage too high with GridSearchCV Parameters 1 Parameters 2 Parameters 3 Parameters 4 print(best_parameters) {'batch_size': 25, 'nb_epoch': 100, 'optimizer': 'adam'} print(best_accuracy) 0.79600000000000004 Alternative: Take iterative approach instead for key, value in parameters.items(): build_classifier.fit(x_train, y_train) print()
  • 21. Scaling Up From 120 to ½ Samples in Zone 6 573 Best model Confusion matrix from keras.models import Sequential from keras.layers import Dense, Flatten, Dropout classifier = Sequential() classifier.add(Dense(25, input_shape=(512, 660, 16), activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.2)) classifier.add(Flatten()) classifier.add(Dense(50, activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.2)) classifier.add(Dense(50, activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.2)) classifier.add(Dense(50, activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.2)) classifier.add(Dense(50, activation='relu', kernel_initializer='uniform')) classifier.add(Dropout(rate=0.2)) classifier.add(Dense(1, activation='sigmoid', kernel_initializer='uniform')) classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) classifier.fit(x_train, y_train, batch_size=25, epochs=50) *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5 *5*5 *5 *5*5*5*5*5*5*5 *5 *5*5*5*5*5*5*5*5*5*5
  • 22. Challenges Scaling up Alternative: • Use online learning model, iterate through each image in a partial fit model Next step: Scale up to entire Zone 6 sample 1147 arr_list = [read_data(z6path) for z6path in z6paths] x = np.stack(arr_list, axis=0) X size = ~24.8 GB TOO BIG!
  • 23. Working with Can take awhile to connect • Good tutorial: Limited free credits ($300) • Rate depends on power & region – Estimate run cost • Plan: use all credits for 1-2 runs http://cs231n.github.io/gce-tutorial/
  • 24. Working model to date • High rate of identifying non-threats • Low rate of false positive threat ID BUT… • Also has high rate of false negatives = =
  • 25. TSA’s current algorithm also has a high false negative rate
  • 26. What’s Next? One month to go • Reduce false negative rate • Run in Google Cloud Platform • Productionalize model
  • 27. Key Takeaways of Challenging Projects • Big data challenges even with few samples • Flexibility with project scope • Don’t be intimidated • Lots can be learned in a short time!
  • 28. Thanks! • Thanks for coming tonight • ChiPy mentorship program • Trunk Club for hosting

Editor's Notes

  1. focus on model building, but do have to do some image recognition, preprocessing Extension for jupyter nb that helps with presentations
  2. Must Neuron: weighted sum of inputs NN learn by adjusting weights
  3. Cost function is error in prediction and want to minimize it
  4. Cost function is error in prediction and want to minimize it