This document provides an overview of troubleshooting techniques for deep learning models. It recommends starting simple by choosing a simple architecture, normalizing inputs, simplifying the problem, and using default hyperparameters. The key strategy is to start with a minimal viable product, implement and debug it, then gradually increase complexity by tuning hyperparameters, improving the model or data, and repeating until performance meets requirements. Troubleshooting deep learning is difficult because errors are often invisible and performance can be sensitive to small changes.
At my talk "On Impact in Software Engineering Research", I present a number of lessons from (and for!) high-impact research:
* Work on a real problem
* Assume as little as possible
* Keep things simple
* Have a sound model
* Keep on learning
* Keep on moving
* Build prototypes
Video at https://youtu.be/md4Fp3Pro0o
Andreas Zeller is faculty at the CISPA Helmholtz Center for Information Security and professor for Software Engineering at Saarland University, both in Saarbrücken, Germany. His research on automated debugging, mining software archives, specification mining, and security testing has won several awards for its impact in academia and industry. Zeller is an ACM Fellow, an IFIP Fellow, an ERC Advanced Grant Awardee, and holds an ACM SIGSOFT Outstanding Research Award.
Andreas Zeller, On Impact in Software Engineering Research
Abstract: After 25 years in research, I am happy to have had some impact in Software Engineering Research - some of it on purpose, some of it accidentally. In this talk, I summarize six important lessons I have learned: Work on a real problem, assume as little as possible, keep things simple, have a sound model, keep on learning, and keep on moving.
Bio: Andreas Zeller is a full professor for Software Engineering at Saarland University in Saarbrücken, Germany, since 2001. His research concerns the analysis of large software systems and their development process. In 2010, Zeller was inducted as Fellow of the ACM for his contributions to automated debugging and mining software archives, for which he also was awarded 10-year impact awards from ACM SIGSOFT and ICSE. In 2011, he received an ERC Advanced Grant, Europe's highest and most prestigious individual research grant, for work on specification mining and test case generation. In 2013, Zeller co-founded Testfabrik AG, a start-up on automatic testing of Web applications, where he chairs the supervisory board. In 2018, he received the Outstanding Research Award from ACM SiGSOFT.
How do you make your research impactful? How do you get towards your book, your tool, your algorithm? Andreas Zeller shares lessons learned in his career.
In this talk we explore how to build Machine Learning Systems that can that can learn "continuously" from their mistakes (feedback loop) and adapt to an evolving data distribution.
The youtube link to video of the talk is here:
https://www.youtube.com/watch?v=VtBvmrmMJaI
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this workshop, we will discuss the basics of Neural Networks and discuss how Deep Learning Neural networks are different from conventional Neural Network architectures. We will review a bit of mathematics that goes into building neural networks and understand the role of GPUs in Deep Learning. We will also get an introduction to Autoencoders, Convolutional Neural Networks, Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano and Tensorflow.
What does it take to have high impact in software engineering research? Andreas Zeller, a "high impact" SE researcher, shares his personal story and perspective.
Video: http://videos.re-work.co/videos/464-agile-deep-learning
Deep Learning has been called the ‘new electricity’ — transforming every industry. Innovative architectures and applications receive deserved attention. But to turn innovation into value requires integrating deep learning into practical technology products. Such products, including Spotify's, are often developed following the principles of agile. This talk focuses on approaching deep learning in an agile way and on integrating deep learning into the agile cadence of a modern software development organization.
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
In this talk we present lessons learned, good ideas, and thoughts on the future, with an eye toward informing junior researchers about the realities and opportunities of a long-running project. We highlight some notions from the original paper that stood the test of time, some that were not as prescient, and some that became more relevant as industrial practice advanced. We place the work in context, highlighting perceptions from software engineering and evolutionary computing, then and now, of how program repair could possibly work. We discuss the importance of measurable benchmarks and reproducible research in bringing scientists together and advancing the area. We give our thoughts on the role of quality requirements and properties in program repair. From testing to metrics to scalability to human factors to technology transfer, software repair touches many aspects of software engineering, and we hope a behind-the-scenes exploration of some of our struggles and successes may benefit researchers pursuing new projects.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
At my talk "On Impact in Software Engineering Research", I present a number of lessons from (and for!) high-impact research:
* Work on a real problem
* Assume as little as possible
* Keep things simple
* Have a sound model
* Keep on learning
* Keep on moving
* Build prototypes
Video at https://youtu.be/md4Fp3Pro0o
Andreas Zeller is faculty at the CISPA Helmholtz Center for Information Security and professor for Software Engineering at Saarland University, both in Saarbrücken, Germany. His research on automated debugging, mining software archives, specification mining, and security testing has won several awards for its impact in academia and industry. Zeller is an ACM Fellow, an IFIP Fellow, an ERC Advanced Grant Awardee, and holds an ACM SIGSOFT Outstanding Research Award.
Andreas Zeller, On Impact in Software Engineering Research
Abstract: After 25 years in research, I am happy to have had some impact in Software Engineering Research - some of it on purpose, some of it accidentally. In this talk, I summarize six important lessons I have learned: Work on a real problem, assume as little as possible, keep things simple, have a sound model, keep on learning, and keep on moving.
Bio: Andreas Zeller is a full professor for Software Engineering at Saarland University in Saarbrücken, Germany, since 2001. His research concerns the analysis of large software systems and their development process. In 2010, Zeller was inducted as Fellow of the ACM for his contributions to automated debugging and mining software archives, for which he also was awarded 10-year impact awards from ACM SIGSOFT and ICSE. In 2011, he received an ERC Advanced Grant, Europe's highest and most prestigious individual research grant, for work on specification mining and test case generation. In 2013, Zeller co-founded Testfabrik AG, a start-up on automatic testing of Web applications, where he chairs the supervisory board. In 2018, he received the Outstanding Research Award from ACM SiGSOFT.
How do you make your research impactful? How do you get towards your book, your tool, your algorithm? Andreas Zeller shares lessons learned in his career.
In this talk we explore how to build Machine Learning Systems that can that can learn "continuously" from their mistakes (feedback loop) and adapt to an evolving data distribution.
The youtube link to video of the talk is here:
https://www.youtube.com/watch?v=VtBvmrmMJaI
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this workshop, we will discuss the basics of Neural Networks and discuss how Deep Learning Neural networks are different from conventional Neural Network architectures. We will review a bit of mathematics that goes into building neural networks and understand the role of GPUs in Deep Learning. We will also get an introduction to Autoencoders, Convolutional Neural Networks, Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano and Tensorflow.
What does it take to have high impact in software engineering research? Andreas Zeller, a "high impact" SE researcher, shares his personal story and perspective.
Video: http://videos.re-work.co/videos/464-agile-deep-learning
Deep Learning has been called the ‘new electricity’ — transforming every industry. Innovative architectures and applications receive deserved attention. But to turn innovation into value requires integrating deep learning into practical technology products. Such products, including Spotify's, are often developed following the principles of agile. This talk focuses on approaching deep learning in an agile way and on integrating deep learning into the agile cadence of a modern software development organization.
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
In this talk we present lessons learned, good ideas, and thoughts on the future, with an eye toward informing junior researchers about the realities and opportunities of a long-running project. We highlight some notions from the original paper that stood the test of time, some that were not as prescient, and some that became more relevant as industrial practice advanced. We place the work in context, highlighting perceptions from software engineering and evolutionary computing, then and now, of how program repair could possibly work. We discuss the importance of measurable benchmarks and reproducible research in bringing scientists together and advancing the area. We give our thoughts on the role of quality requirements and properties in program repair. From testing to metrics to scalability to human factors to technology transfer, software repair touches many aspects of software engineering, and we hope a behind-the-scenes exploration of some of our struggles and successes may benefit researchers pursuing new projects.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
IWMW 2002: Interoperability and Learning Standards briefing: Does Interoperab...IWMW
Web Standards Briefing session at IWMW 2002 event by Lorna Campbell and Neil Sclater.
See http://www.ukoln.ac.uk/web-focus/events/workshops/webmaster-2002/materials/sclater/
ICSE’14 Workshop Keynote Address: Emerging Trends in Software Metrics (WeTSOM’14).
Data about software projects is not stored in metrc1, metric2,…,
but is shared between them in some shared, underlying,shape.
Not every project has thesame underlying simple shape; many projects have different,
albeit simple, shapes.
We can exploit that shape, to great effect: for better local predictions; for transferring
lessons learned; for privacy-preserving data mining/
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
Overview and extended description: AI is expected to be the engine of technological advancements in the healthcare industry, especially in the areas of radiology and image processing. The purpose of this session is to demonstrate how we can build a AI-based Radiologist system using Apache Spark and Analytics Zoo to detect pneumonia and other diseases from chest x-ray images. The dataset, released by the NIH, contains around 110,00 X-ray images of around 30,000 unique patients, annotated with up to 14 different thoracic pathology labels. Stanford University developed a state-of-the-art model using CNN and exceeds average radiologist performance on the F1 metric. This talk focuses on how we can build a multi-label image classification model in a distributed Apache Spark infrastructure, and demonstrate how to build complex image transformations and deep learning pipelines using BigDL and Analytics Zoo with scalability and ease of use. Some practical image pre-processing procedures and evaluation metrics are introduced. We will also discuss runtime configuration, near-linear scalability for training and model serving, and other general performance topics.
Representational Continuity for Unsupervised Continual LearningMLAI2
Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where we learn the feature representations on an unlabelled sequence of tasks and show that reliance on annotated data is not necessary for continual learning. We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-of-distribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (Lump), a simple yet effective technique that interpolates between the current task and previous tasks' instances to alleviate catastrophic forgetting for unsupervised representations.
1. (TCO A) A ___ defines the format and the order of messages exchanged between 2 or more communicating entities.
2. (TCO A) While the job of the link layer is to move entire frames from one network element to another, The job of the physical layer is to do what?
3. (TCO A) Which of the following is not true about ISO:
4. (TCO A) What are the two fundamental approaches to moving data through a network of links and
Similar to Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - Spring 2021) (20)
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - Spring 2021)
1. Full Stack Deep Learning - UC Berkeley Spring 2021 - Sergey Karayev, Josh Tobin, Pieter Abbeel
Troubleshooting Deep Neural
Networks
2. Full Stack Deep Learning - UC Berkeley Spring 2021
Lifecycle of a ML project
2
Planning &
project setup
Data collection
& labeling
Training &
debugging
Deploying &
testing
Team & hiring
Per-project
activities
Infra &
tooling
Cross-project
infrastructure
Troubleshooting - overview
3. Full Stack Deep Learning - UC Berkeley Spring 2021
Why talk about DL troubleshooting?
3
XKCD, https://xkcd.com/1838/
Troubleshooting - overview
4. Full Stack Deep Learning - UC Berkeley Spring 2021
Why talk about DL troubleshooting?
4
Troubleshooting - overview
5. Full Stack Deep Learning - UC Berkeley Spring 2021
Why talk about DL troubleshooting?
5
Troubleshooting - overview
Common sentiment among practitioners:
80-90% of time debugging and tuning
10-20% deriving math or implementing things
6. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is DL troubleshooting so hard?
6
Troubleshooting - overview
7. Full Stack Deep Learning - UC Berkeley Spring 2021
Suppose you can’t reproduce a result
7
He, Kaiming, et al. "Deep residual learning for image recognition."
Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Your learning curve
Troubleshooting - overview
8. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
8
Poor model
performance
Troubleshooting - overview
9. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
9
Poor model
performance
Implementation
bugs
Troubleshooting - overview
10. Full Stack Deep Learning - UC Berkeley Spring 2021
Most DL bugs are invisible
10
Troubleshooting - overview
11. Full Stack Deep Learning - UC Berkeley Spring 2021
Most DL bugs are invisible
11
Troubleshooting - overview
Labels out of order!
12. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
12
Poor model
performance
Implementation
bugs
Troubleshooting - overview
13. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
13
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
14. Full Stack Deep Learning - UC Berkeley Spring 2021
Models are sensitive to hyperparameters
14
Andrej Karpathy, CS231n course notes
Troubleshooting - overview
15. Full Stack Deep Learning - UC Berkeley Spring 2021
Andrej Karpathy, CS231n course notes He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet
classification." Proceedings of the IEEE international conference on computer vision. 2015.
Models are sensitive to hyperparameters
15
Troubleshooting - overview
16. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
16
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
17. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
17
Data/model fit
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
18. Full Stack Deep Learning - UC Berkeley Spring 2021
Data / model fit
18
Yours: self-driving car images
Troubleshooting - overview
Data from the paper: ImageNet
19. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
19
Data/model fit
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
20. Full Stack Deep Learning - UC Berkeley Spring 2021
Why is your performance worse?
20
Dataset
construction
Data/model fit
Poor model
performance
Implementation
bugs
Hyperparameter
choices
Troubleshooting - overview
21. Full Stack Deep Learning - UC Berkeley Spring 2021
Constructing good datasets is hard
21
Amount of lost sleep over...
PhD Tesla
Slide from Andrej Karpathy’s talk “Building the Software 2.0 Stack” at TrainAI 2018, 5/10/2018
Troubleshooting - overview
22. Full Stack Deep Learning - UC Berkeley Spring 2021
Common dataset construction issues
• Not enough data
• Class imbalances
• Noisy labels
• Train / test from different distributions
• etc
22
Troubleshooting - overview
23. Full Stack Deep Learning - UC Berkeley Spring 2021
Takeaways: why is troubleshooting hard?
• Hard to tell if you have a bug
• Lots of possible sources for the same degradation in
performance
• Results can be sensitive to small changes in
hyperparameters and dataset makeup
23
Troubleshooting - overview
24. Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
24
Troubleshooting - overview
25. Full Stack Deep Learning - UC Berkeley Spring 2021
Key mindset for DL troubleshooting
25
Pessimism
Troubleshooting - overview
26. Full Stack Deep Learning - UC Berkeley Spring 2021
Key idea of DL troubleshooting
26
…Start simple and gradually
ramp up complexity
Since it’s hard to
disambiguate errors…
Troubleshooting - overview
27. Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
27
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - overview
28. Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
28
Start
simple
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
Troubleshooting - overview
29. Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
29
Implement
& debug
Start
simple
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
Troubleshooting - overview
30. Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
30
Implement
& debug
Start
simple
Evaluate
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
Troubleshooting - overview
31. Full Stack Deep Learning - UC Berkeley Spring 2021
Quick summary
31
Tune hyp-
eparams
Implement
& debug
Start
simple
Evaluate
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
• Use coarse-to-fine random searches
Troubleshooting - overview
32. Full Stack Deep Learning - UC Berkeley Spring 2021
Tune hyp-
eparams
Quick summary
32
Implement
& debug
Start
simple
Evaluate
Improve
model/data
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
• Use coarse-to-fine random searches
• Make your model bigger if you underfit; add
data or regularize if you overfit
Troubleshooting - overview
33. Full Stack Deep Learning - UC Berkeley Spring 2021
We’ll assume you already have…
• Initial test set
• A single metric to improve
• Target performance based on human-level performance, published results, previous
baselines, etc
33
Troubleshooting - overview
34. Full Stack Deep Learning - UC Berkeley Spring 2021
We’ll assume you already have…
34
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - overview
• Initial test set
• A single metric to improve
• Target performance based on human-level
performance, published results, previous
baselines, etc
35. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
35
Troubleshooting - overview
36. Full Stack Deep Learning - UC Berkeley Spring 2021
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Strategy for DL troubleshooting
36
Troubleshooting - start simple
37. Full Stack Deep Learning - UC Berkeley Spring 2021
Starting simple
37
Normalize inputs
Choose a simple
architecture
Simplify the problem
Use sensible defaults
Steps
b
a
c
d
Troubleshooting - start simple
38. Full Stack Deep Learning - UC Berkeley Spring 2021
Demystifying architecture selection
38
Start here Consider using this later
Images LeNet-like architecture
LSTM with one hidden
layer (or temporal convs)
Fully connected neural net
with one hidden layer
ResNet
Attention model or
WaveNet-like model
Problem-dependent
Images
Sequences
Other
Troubleshooting - start simple
39. Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
39
“This”
“is”
“a”
“cat”
Input 2
Input 3
Input 1
Troubleshooting - start simple
40. Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
40
“This”
“is”
“a”
“cat”
Input 2
Input 3
Input 1
Troubleshooting - start simple
1. Map each into a lower dimensional feature space
41. Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
41
ConvNet Flatten
“This”
“is”
“a”
“cat”
LSTM
Input 2
Input 3
Input 1
(64-dim)
(72-dim)
(48-dim)
Troubleshooting - start simple
1. Map each into a lower dimensional feature space
42. Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
42
ConvNet Flatten Con“cat”
“This”
“is”
“a”
“cat”
LSTM
Input 2
Input 3
Input 1
(64-dim)
(72-dim)
(48-dim)
(184-dim)
Troubleshooting - start simple
2. Concatenate
43. Full Stack Deep Learning - UC Berkeley Spring 2021
Dealing with multiple input modalities
43
ConvNet Flatten Concat
“This”
“is”
“a”
“cat”
LSTM
FC FC Output
T/F
Input 2
Input 3
Input 1
(64-dim)
(72-dim)
(48-dim)
(184-dim)
(256-dim)
(128-dim)
Troubleshooting - start simple
3. Pass through fully connected layers to output
44. Full Stack Deep Learning - UC Berkeley Spring 2021
Starting simple
44
Normalize inputs
Choose a simple
architecture
Simplify the problem
Use sensible defaults
Steps
b
a
c
d
Troubleshooting - start simple
45. Full Stack Deep Learning - UC Berkeley Spring 2021
Recommended network / optimizer defaults
• Optimizer: Adam optimizer with learning rate 3e-4
• Activations: relu (FC and Conv models), tanh (LSTMs)
• Initialization: He et al. normal (relu), Glorot normal (tanh)
• Regularization: None
• Data normalization: None
45
Troubleshooting - start simple
46. Full Stack Deep Learning - UC Berkeley Spring 2021
Definitions of recommended initializers
• (n is the number of inputs, m is the number of outputs)
• He et al. normal (used for ReLU)
• Glorot normal (used for tanh)
46
N 0,
r
2
n
!
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
<latexit sha1_base64="71oIrQeCI+zKtAqPcxHfa3NQrcI=">AAACFnicbVDLSgMxFM3UV62vqks3wSJU0DJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMVbvwVNy4UcSvu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzUL567AYERJSKpp65gPpTtM+zqewWJ6ytCk2qayDR1FR+O4LRfLNkVewq8TJw5KaE5Gv3ilzsIaRwwCVQQrbuOHUEvIQo4FSwtuLFmEaFjMmRdQyUJmO4l01gpPjHKAPuhMk8Cnqq/NxISaD0JPDOZhdCLXib+53Vj8K96CZdRDEzS2SE/FhhCnHWEB1wxCmJiCKGKm79iOiKmDTBNFkwJzmLkZdKqVhy74txelGr1eR15dISOURk56BLV0A1qoCai6BE9o1f0Zj1ZL9a79TEbzVnznUP0B9bnD/n6n+k=</latexit>
N 0,
r
2
n + m
!
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
<latexit sha1_base64="zIQOpEBJhC7QdJVej5ZiUKz47uk=">AAACGnicbVDLSgMxFM3UV62vqks3wSJUlDJTBF0W3LgqFewDOqVk0kwbmsmMyR2hDPMdbvwVNy4UcSdu/BszbRfaeiBwOOdebs7xIsE12Pa3lVtZXVvfyG8WtrZ3dveK+wctHcaKsiYNRag6HtFMcMmawEGwTqQYCTzB2t74OvPbD0xpHso7mESsF5Ch5D6nBIzULzpuQGBEiUjqqSuYD2X7HLv6XkHi+orQpJomEp/hIE1dxYcjOO0XS3bFngIvE2dOSmiORr/46Q5CGgdMAhVE665jR9BLiAJOBUsLbqxZROiYDFnXUEkCpnvJNFqKT4wywH6ozJOAp+rvjYQEWk8Cz0xmQfSil4n/ed0Y/KtewmUUA5N0dsiPBYYQZz3hAVeMgpgYQqji5q+YjohpBEybBVOCsxh5mbSqFceuOLcXpVp9XkceHaFjVEYOukQ1dIMaqIkoekTP6BW9WU/Wi/VufcxGc9Z85xD9gfX1Axa5oOk=</latexit>
Troubleshooting - start simple
47. Full Stack Deep Learning - UC Berkeley Spring 2021
Normalize inputs
Starting simple
47
Choose a simple
architecture
Simplify the problem
Use sensible defaults
b
a
c
Steps
d
Troubleshooting - start simple
48. Full Stack Deep Learning - UC Berkeley Spring 2021
Important to normalize scale of input data
• Subtract mean and divide by variance
• For images, fine to scale values to [0, 1] or [-0.5, 0.5]
(e.g., by dividing by 255)
[Careful, make sure your library doesn’t do it for you!]
48
Troubleshooting - start simple
49. Full Stack Deep Learning - UC Berkeley Spring 2021
Normalize inputs
Starting simple
49
Choose a simple
architecture
Simplify the problem
Use sensible defaults
b
a
c
Steps
d
Troubleshooting - start simple
50. Full Stack Deep Learning - UC Berkeley Spring 2021
Consider simplifying the problem as well
• Start with a small training set (~10,000 examples)
• Use a fixed number of objects, classes, image size, etc.
• Create a simpler synthetic training set
50
Troubleshooting - start simple
51. Full Stack Deep Learning - UC Berkeley Spring 2021
Simplest model for pedestrian detection
• Start with a subset of 10,000 images for training, 1,000 for val, and 500
for test
• Use a LeNet architecture with sigmoid cross-entropy loss
• Adam optimizer with LR 3e-4
• No regularization
51
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - start simple
52. Full Stack Deep Learning - UC Berkeley Spring 2021
Normalize inputs
Starting simple
52
Choose a simple
architecture
Simplify the problem
Use sensible defaults
b
a
c
Steps
d
Summary
• Start with a simpler
version of your problem
(e.g., smaller dataset)
• Adam optimizer & no
regularization
• Subtract mean and divide
by std, or just divide by
255 (ims)
• LeNet, LSTM, or fully
connected
Troubleshooting - start simple
53. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
53
Troubleshooting - start simple
54. Full Stack Deep Learning - UC Berkeley Spring 2021
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Strategy for DL troubleshooting
54
Troubleshooting - debug
55. Full Stack Deep Learning - UC Berkeley Spring 2021
Implementing bug-free DL models
55
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Troubleshooting - debug
56. Full Stack Deep Learning - UC Berkeley Spring 2021
Preview: the five most common DL bugs
• Incorrect shapes for your tensors
Can fail silently! E.g., accidental broadcasting: x.shape = (None,), y.shape =
(None, 1), (x+y).shape = (None, None)
• Pre-processing inputs incorrectly
E.g., Forgetting to normalize, or too much pre-processing
• Incorrect input to your loss function
E.g., softmaxed outputs to a loss that expects logits
• Forgot to set up train mode for the net correctly
E.g., toggling train/eval, controlling batch norm dependencies
• Numerical instability - inf/NaN
Often stems from using an exp, log, or div operation
56
Troubleshooting - debug
57. Full Stack Deep Learning - UC Berkeley Spring 2021
General advice for implementing your model
57
Use off-the-shelf components, e.g.,
• Keras
• tf.layers.dense(…)
instead of
tf.nn.relu(tf.matmul(W, x))
• tf.losses.cross_entropy(…)
instead of writing out the exp
Lightweight implementation
• Minimum possible new lines of
code for v1
• Rule of thumb: <200 lines
• (Tested infrastructure
components are fine)
Build complicated data pipelines later
• Start with a dataset you can load into
memory
Troubleshooting - debug
58. Full Stack Deep Learning - UC Berkeley Spring 2021
Implementing bug-free DL models
58
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Troubleshooting - debug
59. Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
a
Shape
mismatch
Casting
issue
OOM
Other
Common
issues Recommended resolution
Scale back memory intensive
operations one-by-one
Standard debugging toolkit (Stack
Overflow + interactive debugger)
Implementing bug-free DL models
59
Step through model creation and
inference in a debugger
Troubleshooting - debug
60. Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
a
Shape
mismatch
Casting
issue
OOM
Other
Common
issues Recommended resolution
Scale back memory intensive
operations one-by-one
Standard debugging toolkit (Stack
Overflow + interactive debugger)
Implementing bug-free DL models
60
Step through model creation and
inference in a debugger
Troubleshooting - debug
61. Full Stack Deep Learning - UC Berkeley Spring 2021
Debuggers for DL code
61
• Pytorch: easy, use ipdb
• tensorflow: trickier
Option 1: step through graph creation
Troubleshooting - debug
62. Full Stack Deep Learning - UC Berkeley Spring 2021
Debuggers for DL code
62
• Pytorch: easy, use ipdb
• tensorflow: trickier
Option 2: step into training loop
Evaluate tensors using sess.run(…)
Troubleshooting - debug
63. Full Stack Deep Learning - UC Berkeley Spring 2021
Debuggers for DL code
63
• Pytorch: easy, use ipdb
• tensorflow: trickier
Option 3: use tfdb
Stops
execution at
each
sess.run(…)
and lets you
inspect
python -m tensorflow.python.debug.examples.debug_mnist --debug
Troubleshooting - debug
64. Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
a
Shape
mismatch
Casting
issue
OOM
Other
Common
issues Recommended resolution
Scale back memory intensive
operations one-by-one
Standard debugging toolkit (Stack
Overflow + interactive debugger)
Implementing bug-free DL models
64
Step through model creation and
inference in a debugger
Troubleshooting - debug
65. Full Stack Deep Learning - UC Berkeley Spring 2021
Shape
mismatch Undefined
shapes
Incorrect
shapes
Common
issues Most common causes
• Flipped dimensions when using tf.reshape(…)
• Took sum, average, or softmax over wrong
dimension
• Forgot to flatten after conv layers
• Forgot to get rid of extra “1” dimensions (e.g., if
shape is (None, 1, 1, 4)
• Data stored on disk in a different dtype than
loaded (e.g., stored a float64 numpy array, and
loaded it as a float32)
Implementing bug-free DL models
65
• Confusing tensor.shape, tf.shape(tensor),
tensor.get_shape()
• Reshaping things to a shape of type Tensor (e.g.,
when loading data from a file)
Troubleshooting - debug
66. Full Stack Deep Learning - UC Berkeley Spring 2021
Casting issue
Data not in
float32
Common
issues Most common causes
Implementing bug-free DL models
66
• Forgot to cast images from uint8 to float32
• Generated data using numpy in float64, forgot to
cast to float32
Troubleshooting - debug
67. Full Stack Deep Learning - UC Berkeley Spring 2021
OOM
Too big a
tensor
Too much
data
Common
issues Most common causes
Duplicating
operations
Other
processes
• Other processes running on your GPU
• Memory leak due to creating multiple models in
the same session
• Repeatedly creating an operation (e.g., in a
function that gets called over and over again)
• Too large a batch size for your model (e.g.,
during evaluation)
• Too large fully connected layers
• Loading too large a dataset into memory, rather
than using an input queue
• Allocating too large a buffer for dataset creation
Implementing bug-free DL models
67
Troubleshooting - debug
68. Full Stack Deep Learning - UC Berkeley Spring 2021
Other common
errors Other bugs
Common
issues Most common causes
Implementing bug-free DL models
68
• Forgot to initialize variables
• Forgot to turn off bias when using batch norm
• “Fetch argument has invalid type” - usually you
overwrote one of your ops with an output during
training
Troubleshooting - debug
69. Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Implementing bug-free DL models
69
Troubleshooting - debug
70. Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
Implementing bug-free DL models
70
Troubleshooting - debug
71. Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
Implementing bug-free DL models
71
Troubleshooting - debug
• Flipped the sign of the loss function / gradient
• Learning rate too high
• Softmax taken over wrong dimension
72. Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Numerical issue. Check all exp, log, and div operations
• Learning rate too high
Implementing bug-free DL models
72
Troubleshooting - debug
73. Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Data or labels corrupted (e.g., zeroed, incorrectly
shuffled, or preprocessed incorrectly)
• Learning rate too high
Implementing bug-free DL models
73
Troubleshooting - debug
74. Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Learning rate too low
• Gradients not flowing through the whole model
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits, accidentally add ReLU on output)
• Data or labels corrupted
Implementing bug-free DL models
74
Troubleshooting - debug
75. Full Stack Deep Learning - UC Berkeley Spring 2021
Overfit a single
batch
b
Error goes up
Error oscillates
Common
issues
Error explodes
Error plateaus
Most common causes
• Numerical issue. Check all exp, log, and div operations
• Learning rate too high
• Data or labels corrupted (e.g., zeroed or incorrectly
shuffled)
• Learning rate too high
• Learning rate too low
• Gradients not flowing through the whole model
• Too much regularization
• Incorrect input to loss function (e.g., softmax instead of
logits)
• Data or labels corrupted
Implementing bug-free DL models
75
Troubleshooting - debug
• Flipped the sign of the loss function / gradient
• Learning rate too high
• Softmax taken over wrong dimension
76. Full Stack Deep Learning - UC Berkeley Spring 2021
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps
Implementing bug-free DL models
76
Troubleshooting - debug
77. Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
77
More
useful
Less
useful
You can:
• Walk through code line-by-line and
ensure you have the same output
• Ensure your performance is up to par
with expectations
Troubleshooting - debug
• Official model implementation evaluated on similar dataset
to yours
78. Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
78
More
useful
Less
useful
You can:
• Walk through code line-by-line and
ensure you have the same output
Troubleshooting - debug
• Official model implementation evaluated on similar
dataset to yours
• Official model implementation evaluated on benchmark
(e.g., MNIST)
79. Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar dataset
to yours
• Official model implementation evaluated on benchmark
(e.g., MNIST)
• Unofficial model implementation
• Results from the paper (with no code)
• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Results from a similar model on a similar dataset
• Super simple baselines (e.g., average of outputs or linear
regression)
Hierarchy of known results
79
More
useful
Less
useful
You can:
• Same as before, but with lower
confidence
Troubleshooting - debug
80. Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
80
More
useful
Less
useful
You can:
• Ensure your performance is up to par
with expectations
Troubleshooting - debug
• Official model implementation evaluated on similar
dataset to yours
• Official model implementation evaluated on benchmark
(e.g., MNIST)
• Unofficial model implementation
• Results from a paper (with no code)
81. Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar
dataset to yours
• Official model implementation evaluated on benchmark
(e.g., MNIST)
• Unofficial model implementation
• Results from the paper (with no code)
• Results from your model on a benchmark dataset (e.g.,
MNIST)
Hierarchy of known results
81
More
useful
Less
useful
You can:
• Make sure your model performs well in a
simpler setting
Troubleshooting - debug
82. Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar dataset
to yours
• Official model implementation evaluated on benchmark
(e.g., MNIST)
• Unofficial model implementation
• Results from the paper (with no code)
• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Results from a similar model on a similar dataset
• Super simple baselines (e.g., average of outputs or linear
regression)
Hierarchy of known results
82
More
useful
Less
useful
You can:
• Get a general sense of what kind of
performance can be expected
Troubleshooting - debug
83. Full Stack Deep Learning - UC Berkeley Spring 2021
• Official model implementation evaluated on similar dataset
to yours
• Official model implementation evaluated on benchmark
(e.g., MNIST)
• Unofficial model implementation
• Results from the paper (with no code)
• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Results from a similar model on a similar dataset
• Super simple baselines (e.g., average of outputs or linear
regression)
Hierarchy of known results
83
More
useful
Less
useful
You can:
• Make sure your model is learning
anything at all
Troubleshooting - debug
84. Full Stack Deep Learning - UC Berkeley Spring 2021
Hierarchy of known results
84
More
useful
Less
useful
Troubleshooting - debug
• Official model implementation evaluated on similar dataset
to yours
• Official model implementation evaluated on benchmark
(e.g., MNIST)
• Unofficial model implementation
• Results from the paper (with no code)
• Results from your model on a benchmark dataset (e.g.,
MNIST)
• Results from a similar model on a similar dataset
• Super simple baselines (e.g., average of outputs or linear
regression)
85. Full Stack Deep Learning - UC Berkeley Spring 2021
Summary: how to implement & debug
85
Get your model to
run
Compare to a
known result
Overfit a single
batch
b
a
c
Steps Summary
• Look for corrupted data, over-
regularization, broadcasting errors
• Keep iterating until model performs
up to expectations
Troubleshooting - debug
• Step through in debugger & watch out
for shape, casting, and OOM errors
86. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
86
Troubleshooting - debug
87. Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
87
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - evaluate
88. Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
88
Troubleshooting - evaluate
89. Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
89
Troubleshooting - evaluate
90. Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
90
Troubleshooting - evaluate
91. Full Stack Deep Learning - UC Berkeley Spring 2021
25
2 27
5 32
2 34
Irreducible error
Avoidable bias
(i.e., underfitting)
Train error
Variance (i.e.,
overfitting) Val error
Val set overfitting
Test error
0
5
10
15
20
25
30
35
40
Breakdown of test error by source
Bias-variance decomposition
91
Troubleshooting - evaluate
92. Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance decomposition
• Test error = irreducible error + bias + variance + val overfitting
• This assumes train, val, and test all come from the same distribution.
What if not?
92
Troubleshooting - evaluate
93. Full Stack Deep Learning - UC Berkeley Spring 2021
Handling distribution shift
93
Test data
Use two val sets: one sampled from training distribution and
one from test distribution
Train data
Troubleshooting - evaluate
94. Full Stack Deep Learning - UC Berkeley Spring 2021
The bias-variance tradeoff
94
Troubleshooting - evaluate
95. Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance with distribution shift
95
Troubleshooting - evaluate
96. Full Stack Deep Learning - UC Berkeley Spring 2021
Bias-variance with distribution shift
96
25
2 27
2 29
3 32
2 34
I
r
r
e
d
u
c
i
b
l
e
e
r
r
o
r
A
v
o
i
d
a
b
l
e
b
i
a
s
(
i
.
e
.
,
u
n
d
e
r
f
i
t
t
i
n
g
)
T
r
a
i
n
e
r
r
o
r
V
a
r
i
a
n
c
e
T
r
a
i
n
v
a
l
e
r
r
o
r
D
i
s
t
r
i
b
u
t
i
o
n
s
h
i
f
t
T
e
s
t
v
a
l
e
r
r
o
r
V
a
l
o
v
e
r
f
i
t
t
i
n
g
T
e
s
t
e
r
r
o
r
0
5
10
15
20
25
30
35
40
Breakdown of test error by source
Troubleshooting - evaluate
97. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
97
Error source Value
Goal
performance
1%
Train error 20%
Validation error 27%
Test error 28%
Train - goal = 19%
(under-fitting)
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - evaluate
98. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
98
Error source Value
Goal
performance
1%
Train error 20%
Validation error 27%
Test error 28%
Val - train = 7%
(over-fitting)
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - evaluate
99. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
99
Error source Value
Goal
performance
1%
Train error 20%
Validation error 27%
Test error 28%
Test - val = 1%
(looks good!)
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - evaluate
100. Full Stack Deep Learning - UC Berkeley Spring 2021
Summary: evaluating model performance
100
Troubleshooting - evaluate
Test error = irreducible error + bias + variance
+ distribution shift + val overfitting
101. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
101
Troubleshooting - evaluate
102. Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
102
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - improve
103. Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
103
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
104. Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing under-fitting (i.e., reducing bias)
104
Try first
Try later
Troubleshooting - improve
A. Make your model bigger (i.e., add layers or use
more units per layer)
B. Reduce regularization
C. Error analysis
D. Choose a different (closer to state-of-the art)
model architecture (e.g., move from LeNet to
ResNet)
E. Tune hyper-parameters (e.g., learning rate)
F. Add features
105. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
105
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
(i.e., 1% error)
Error source Value Value
Goal performance 1% 1%
Train error 20% 7%
Validation error 27% 19%
Test error 28% 20%
Add more layers
to the ConvNet
Troubleshooting - improve
106. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
106
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
(i.e., 1% error)
Error source Value Value Value
Goal performance 1% 1% 1%
Train error 20% 7% 3%
Validation error 27% 19% 10%
Test error 28% 20% 10%
Switch to
ResNet-101
Troubleshooting - improve
107. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
107
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
(i.e., 1% error)
Error source Value Value Value Value
Goal performance 1% 1% 1% 1%
Train error 20% 7% 3% 0.8%
Validation error 27% 19% 10% 12%
Test error 28% 20% 10% 12%
Tune learning
rate
Troubleshooting - improve
108. Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
108
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
109. Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing over-fitting (i.e., reducing variance)
109
Try first
Try later
Troubleshooting - improve
A. Add more training data (if possible!)
B. Add normalization (e.g., batch norm, layer norm)
C. Add data augmentation
D. Increase regularization (e.g., dropout, L2, weight decay)
E. Error analysis
F. Choose a different (closer to state-of-the-art) model
architecture
G. Tune hyperparameters
H. Early stopping
I. Remove features
J. Reduce model size
110. Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing over-fitting (i.e., reducing variance)
110
Try first
Try later
Not
recommended!
Troubleshooting - improve
A. Add more training data (if possible!)
B. Add normalization (e.g., batch norm, layer norm)
C. Add data augmentation
D. Increase regularization (e.g., dropout, L2, weight decay)
E. Error analysis
F. Choose a different (closer to state-of-the-art) model
architecture
G. Tune hyperparameters
H. Early stopping
I. Remove features
J. Reduce model size
111. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
111
Error source Value
Goal performance 1%
Train error 0.8%
Validation error 12%
Test error 12%
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
112. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
112
Error source Value Value
Goal performance 1% 1%
Train error 0.8% 1.5%
Validation error 12% 5%
Test error 12% 6%
Increase dataset
size to 250,000
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
113. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
113
Error source Value Value Value
Goal performance 1% 1% 1%
Train error 0.8% 1.5% 1.7%
Validation error 12% 5% 4%
Test error 12% 6% 4%
Add weight
decay
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
114. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
114
Error source Value Value Value Value
Goal performance 1% 1% 1% 1%
Train error 0.8% 1.5% 1.7% 2%
Validation error 12% 5% 4% 2.5%
Test error 12% 6% 4% 2.6%
Add data
augmentation
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
115. Full Stack Deep Learning - UC Berkeley Spring 2021
Train, val, and test error for pedestrian detection
115
Error source Value Value Value Value Value
Goal performance 1% 1% 1% 1% 1%
Train error 0.8% 1.5% 1.7% 2% 0.6%
Validation error 12% 5% 4% 2.5% 0.9%
Test error 12% 6% 4% 2.6% 1.0%
Tune num layers, optimizer params, weight
initialization, kernel size, weight decay
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - improve
116. Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
116
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
117. Full Stack Deep Learning - UC Berkeley Spring 2021
Addressing distribution shift
117
Try first
Try later
Troubleshooting - improve
A. Analyze test-val set errors & collect more
training data to compensate
B. Analyze test-val set errors & synthesize more
training data to compensate
C. Apply domain adaptation techniques to
training & test distributions
118. Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
118
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Troubleshooting - improve
119. Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
119
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error type 1: hard-to-see
pedestrians
Troubleshooting - improve
120. Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
120
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error type 2: reflections
Troubleshooting - improve
121. Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
121
Test-val set errors (no pedestrian detected) Train-val set errors (no pedestrian detected)
Error type 3 (test-val only):
night scenes
Troubleshooting - improve
122. Full Stack Deep Learning - UC Berkeley Spring 2021
Error analysis
122
Error type
Error %
(train-val)
Error %
(test-val)
Potential solutions Priority
1. Hard-to-see
pedestrians
0.1% 0.1% • Better sensors Low
2. Reflections 0.3% 0.3%
• Collect more data with reflections
• Add synthetic reflections to train set
• Try to remove with pre-processing
• Better sensors
Medium
3. Nighttime
scenes
0.1% 1%
• Collect more data at night
• Synthetically darken training images
• Simulate night-time data
• Use domain adaptation
High
Troubleshooting - improve
123. Full Stack Deep Learning - UC Berkeley Spring 2021
Domain adaptation
123
What is it?
Techniques to train on “source”
distribution and generalize to another
“target” using only unlabeled data or
limited labeled data
When should you consider using it?
• Access to labeled data from test
distribution is limited
• Access to relatively similar data is
plentiful
Troubleshooting - improve
124. Full Stack Deep Learning - UC Berkeley Spring 2021
Types of domain adaptation
124
Type Use case Example techniques
Supervised
You have limited data
from target domain
• Fine-tuning a pre-
trained model
• Adding target data to
train set
Un-supervised
You have lots of un-
labeled data from target
domain
• Correlation Alignment
(CORAL)
• Domain confusion
• CycleGAN
Troubleshooting - improve
125. Full Stack Deep Learning - UC Berkeley Spring 2021
Address distribution
shift
Prioritizing improvements (i.e., applied b-v)
125
Address under-fitting
Re-balance datasets
(if applicable)
Address over-fitting
b
a
c
Steps
d
Troubleshooting - improve
126. Full Stack Deep Learning - UC Berkeley Spring 2021
Rebalancing datasets
• If (test)-val looks significantly better than test, you overfit to the val set
• This happens with small val sets or lots of hyper parameter tuning
• When it does, recollect val data
126
Troubleshooting - improve
127. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
127
Troubleshooting - improve
128. Full Stack Deep Learning - UC Berkeley Spring 2021
Strategy for DL troubleshooting
128
Tune hyper-
parameters
Implement
& debug
Start simple Evaluate
Improve
model/data
Meets re-
quirements
Troubleshooting - tune
129. Full Stack Deep Learning - UC Berkeley Spring 2021
Hyperparameter optimization
129
Model & optimizer choices?
Network: ResNet
- How many layers?
- Weight initialization?
- Kernel size?
- Etc
Optimizer: Adam
- Batch size?
- Learning rate?
- beta1, beta2, epsilon?
Regularization
- ….
0 (no pedestrian) 1 (yes pedestrian)
Goal: 99% classification accuracy
Running example
Troubleshooting - tune
130. Full Stack Deep Learning - UC Berkeley Spring 2021
Which hyper-parameters to tune?
130
Choosing hyper-parameters
• More sensitive to some than others
• Depends on choice of model
• Rules of thumb (only) to the right
• Sensitivity is relative to default values!
(e.g., if you are using all-zeros weight
initialization or vanilla SGD, changing to the
defaults will make a big difference)
Hyperparameter Approximate sensitivity
Learning rate High
Learning rate schedule High
Optimizer choice Low
Other optimizer params
(e.g., Adam beta1)
Low
Batch size Low
Weight initialization Medium
Loss function High
Model depth Medium
Layer size High
Layer params
(e.g., kernel size)
Medium
Weight of regularization Medium
Nonlinearity Low
Troubleshooting - tune
131. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 1: manual hyperparam optimization
131
How it works
• Understand the algorithm
• E.g., higher learning rate means faster
less stable training
• Train & evaluate model
• Guess a better hyperparam value & re-
evaluate
• Can be combined with other methods
(e.g., manually select parameter ranges to
optimizer over)
Advantages
Disadvantages
• For a skilled practitioner, may require least
computation to get good result
• Requires detailed understanding of the
algorithm
• Time-consuming
Troubleshooting - tune
132. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 2: grid search
132
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
• Super simple to implement
• Can produce good results
• Not very efficient: need to train on all
cross-combos of hyper-parameters
• May require prior knowledge about
parameters to get
good results
Advantages
Troubleshooting - tune
133. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 3: random search
133
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
• Easy to implement
• Often produces better results than grid
search
• Not very interpretable
• May require prior knowledge about
parameters to get
good results
Troubleshooting - tune
134. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
134
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Troubleshooting - tune
135. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
135
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Best performers
Troubleshooting - tune
136. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
136
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Troubleshooting - tune
137. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
137
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
Advantages
Troubleshooting - tune
138. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 4: coarse-to-fine
138
Hyperparameter
1
(e.g.,
batch
size)
Hyperparameter 2 (e.g., learning rate)
How it works
Disadvantages
• Can narrow in on very high performing
hyperparameters
• Most used method in practice
• Somewhat manual process
Advantages
etc.
Troubleshooting - tune
139. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 5: Bayesian hyperparam opt
139
How it works (at a high level)
• Start with a prior estimate of parameter
distributions
• Maintain a probabilistic model of the
relationship between hyper-parameter
values and model performance
• Alternate between:
• Training with the hyper-parameter
values that maximize the expected
improvement
• Using training results to update our
probabilistic model
• To learn more, see:
Advantages
Disadvantages
• Generally the most efficient hands-off way
to choose hyperparameters
• Difficult to implement from scratch
• Can be hard to integrate with off-the-shelf
tools
https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
Troubleshooting - tune
140. Full Stack Deep Learning - UC Berkeley Spring 2021
Method 5: Bayesian hyperparam opt
140
How it works (at a high level)
• Start with a prior estimate of parameter
distributions
• Maintain a probabilistic model of the
relationship between hyper-parameter
values and model performance
• Alternate between:
• Training with the hyper-parameter
values that maximize the expected
improvement
• Using training results to update our
probabilistic model
• To learn more, see:
Advantages
Disadvantages
• Generally the most efficient hands-off way
to choose hyperparameters
• Difficult to implement from scratch
• Can be hard to integrate with off-the-shelf
tools
More on tools to do this automatically in
the infrastructure & tooling lecture!
https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
Troubleshooting - tune
141. Full Stack Deep Learning - UC Berkeley Spring 2021
Summary of how to optimize hyperparams
• Coarse-to-fine random searches
• Consider Bayesian hyper-parameter optimization solutions
as your codebase matures
141
Troubleshooting - tune
142. Full Stack Deep Learning - UC Berkeley Spring 2021
Questions?
142
Troubleshooting - tune
143. Full Stack Deep Learning - UC Berkeley Spring 2021
Conclusion
143
• DL debugging is hard due to many
competing sources of error
• To train bug-free DL models, we treat
building our model as an iterative process
• The following steps can make the process
easier and catch errors as early as possible
Troubleshooting - conclusion
144. Full Stack Deep Learning - UC Berkeley Spring 2021
How to build bug-free DL models
144
Tune hyp-
eparams
Implement
& debug
Start
simple
Evaluate
Improve
model/data
• Choose the simplest model & data possible
(e.g., LeNet on a subset of your data)
• Once model runs, overfit a single batch &
reproduce a known result
• Apply the bias-variance decomposition to
decide what to do next
• Use coarse-to-fine random searches
• Make your model bigger if you underfit; add
data or regularize if you overfit
Troubleshooting - conclusion
145. Full Stack Deep Learning - UC Berkeley Spring 2021
Where to go to learn more
• Andrew Ng’s book Machine Learning Yearning (http://
www.mlyearning.org/)
• The following Twitter thread:
https://twitter.com/karpathy/status/1013244313327681536
• This blog post:
https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-
building-deep-neural-networks/
145
Troubleshooting - conclusion
146. Full Stack Deep Learning - UC Berkeley Spring 2021
Thank you!
146