Slide-deck for the lunch talk at IBM Almaden Research Center on Oct 11, 2016.
Abstract: In this lunch talk, I will give a high-level summary of bay area deep learning school which was held at Stanford on Sept 24 and 25. The videos and slides of the lectures are available online at http://www.bayareadlschool.org/. I will also give a very brief introduction of deep learning.
Slides for the Part One of "Deep learning implementations and frameworks" presented as a Tutorial at PAKDD (Pacific Asia Knowledge Discovery and Data Mining Conference) 2016.
The presentation took place on April 19, 2016 at Auckland, New Zealand.
http://pakdd16.wordpress.fos.auckland.ac.nz/technical-program/tutorials/
Introduction of Chainer, a framework for neural networks, v1.11. Slides used for the student seminar on July 20, 2016, at Sugiyama-Sato lab in the Univ. of Tokyo.
An Introduction to TensorFlow architectureMani Goswami
Introduces you to the internals of TensorFlow and deep dives into distributed version of TensorFlow. Refer to https://github.com/manigoswami/tensorflow-examples for examples.
Slides for the Part One of "Deep learning implementations and frameworks" presented as a Tutorial at PAKDD (Pacific Asia Knowledge Discovery and Data Mining Conference) 2016.
The presentation took place on April 19, 2016 at Auckland, New Zealand.
http://pakdd16.wordpress.fos.auckland.ac.nz/technical-program/tutorials/
Introduction of Chainer, a framework for neural networks, v1.11. Slides used for the student seminar on July 20, 2016, at Sugiyama-Sato lab in the Univ. of Tokyo.
An Introduction to TensorFlow architectureMani Goswami
Introduces you to the internals of TensorFlow and deep dives into distributed version of TensorFlow. Refer to https://github.com/manigoswami/tensorflow-examples for examples.
Published on 11 may, 2018
Chainer is a deep learning framework which is flexible, intuitive, and powerful.
This slide introduces some unique features of Chainer and its additional packages such as ChainerMN (distributed learning), ChainerCV (computer vision), ChainerRL (reinforcement learning), Chainer Chemistry (biology and chemistry), and ChainerUI (visualization).
Intro to TensorFlow and PyTorch Workshop at Tubular LabsKendall
These are some introductory slides for the Intro to TensorFlow and PyTorch workshop at Tubular Labs. The Github code is available at:
https://github.com/PythonWorkshop/Intro-to-TensorFlow-and-PyTorch
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
Say What You Mean: Scaling Machine Learning Algorithms Directly from Source Code: Scaling machine learning applications is hard. Even with powerful systems like Spark, Tensor Flow, and Theano, the code you write has more to do with getting these systems to work at all than it does with your algorithm itself. But it doesn’t have to be this way!
In this talk, I’ll discuss an alternate approach we’ve taken with Pyfora, an open-source platform for scalable machine learning and data science in Python. I’ll show how it produces efficient, large scale machine learning implementations directly from the source code of single-threaded Python programs. Instead of programming to a complex API, you can simply say what you mean and move on. I’ll show some classes of problem where this approach truly shines, discuss some practical realities of developing the system, and I’ll talk about some future directions for the project.
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
Academic project based on developing a LSTM distributing it on Spark and using Tensorflow for numerical operations.
Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
Practical Probabilistic Programming with Figaro: Probabilistic reasoning enables you to predict the future, infer the past, and learn from experience. Probabilistic programming enables users to build and reason with a wide variety of probabilistic models without machine learning expertise. In this talk, I will present Figaro, a mature probabilistic programming system with many applications. I will describe the main design principles of the language and show example applications. I will also discuss our current efforts to fully automate and optimize the inference process.
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
Caffe’s expressive architecture encourages application and innovation. Models and optimization are defined by configuration without hard-coding. Switch between CPU and GPU by setting a single flag to train on a GPU machine then deploy to commodity clusters or mobile devices.Caffe’s extensible code fosters active development. In Caffe’s first year, it has been forked by over 1,000 developers and had many significant changes contributed back. Thanks to these contributors the framework tracks the state-of-the-art in both code and models.Speed makes Caffe perfect for research experiments and industry deployment. Caffe can processover 60M images per day with a single NVIDIA K40 GPU*. That’s 1 ms/image for inference and 4 ms/image for learning. We believe that Caffe is the fastest convnet implementation available.Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Join our community of brewers on the caffe-users group and Github.
This tutorial is designed to equip researchers and developers with the tools and know-how needed to incorporate deep learning into their work. Both the ideas and implementation of state-of-the-art deep learning models will be presented. While deep learning and deep features have recently achieved strong results in many tasks, a common framework and shared models are needed to advance further research and applications and reduce the barrier to entry. To this end we present the Caffe framework, public reference models, and working examples for deep learning. Join our tour from the 1989 LeNet for digit recognition to today’s top ILSVRC14 vision models. Follow along with do-it-yourself code notebooks. While focusing on vision, general techniques are covered.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Published on 11 may, 2018
Chainer is a deep learning framework which is flexible, intuitive, and powerful.
This slide introduces some unique features of Chainer and its additional packages such as ChainerMN (distributed learning), ChainerCV (computer vision), ChainerRL (reinforcement learning), Chainer Chemistry (biology and chemistry), and ChainerUI (visualization).
Intro to TensorFlow and PyTorch Workshop at Tubular LabsKendall
These are some introductory slides for the Intro to TensorFlow and PyTorch workshop at Tubular Labs. The Github code is available at:
https://github.com/PythonWorkshop/Intro-to-TensorFlow-and-PyTorch
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
Say What You Mean: Scaling Machine Learning Algorithms Directly from Source Code: Scaling machine learning applications is hard. Even with powerful systems like Spark, Tensor Flow, and Theano, the code you write has more to do with getting these systems to work at all than it does with your algorithm itself. But it doesn’t have to be this way!
In this talk, I’ll discuss an alternate approach we’ve taken with Pyfora, an open-source platform for scalable machine learning and data science in Python. I’ll show how it produces efficient, large scale machine learning implementations directly from the source code of single-threaded Python programs. Instead of programming to a complex API, you can simply say what you mean and move on. I’ll show some classes of problem where this approach truly shines, discuss some practical realities of developing the system, and I’ll talk about some future directions for the project.
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
Academic project based on developing a LSTM distributing it on Spark and using Tensorflow for numerical operations.
Source code: https://github.com/EmanuelOverflow/LSTM-TensorSpark
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
Practical Probabilistic Programming with Figaro: Probabilistic reasoning enables you to predict the future, infer the past, and learn from experience. Probabilistic programming enables users to build and reason with a wide variety of probabilistic models without machine learning expertise. In this talk, I will present Figaro, a mature probabilistic programming system with many applications. I will describe the main design principles of the language and show example applications. I will also discuss our current efforts to fully automate and optimize the inference process.
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
Caffe’s expressive architecture encourages application and innovation. Models and optimization are defined by configuration without hard-coding. Switch between CPU and GPU by setting a single flag to train on a GPU machine then deploy to commodity clusters or mobile devices.Caffe’s extensible code fosters active development. In Caffe’s first year, it has been forked by over 1,000 developers and had many significant changes contributed back. Thanks to these contributors the framework tracks the state-of-the-art in both code and models.Speed makes Caffe perfect for research experiments and industry deployment. Caffe can processover 60M images per day with a single NVIDIA K40 GPU*. That’s 1 ms/image for inference and 4 ms/image for learning. We believe that Caffe is the fastest convnet implementation available.Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Join our community of brewers on the caffe-users group and Github.
This tutorial is designed to equip researchers and developers with the tools and know-how needed to incorporate deep learning into their work. Both the ideas and implementation of state-of-the-art deep learning models will be presented. While deep learning and deep features have recently achieved strong results in many tasks, a common framework and shared models are needed to advance further research and applications and reduce the barrier to entry. To this end we present the Caffe framework, public reference models, and working examples for deep learning. Join our tour from the 1989 LeNet for digit recognition to today’s top ILSVRC14 vision models. Follow along with do-it-yourself code notebooks. While focusing on vision, general techniques are covered.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
This is an 1 hour presentation on Neural Networks, Deep Learning, Computer Vision, Recurrent Neural Network and Reinforcement Learning. The talks later have links on how to run Neural Networks on
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
Algorithmia is a startup with a mission to make state of the art machine learning discoverable by everyone&emdash;they offer the largest algorithm marketplace in the world, with over 2500 algorithms supporting tens of thousands of application developers. Algorithma is the first company to make deep learning, one of the most conceptually difficult areas of computing, accessible to any company via microservices. In this session, you learn how this startup has selected and optimized Amazon EC2 instances for various algorithms (including the latest generation of GPU optimized instances), to create a flexible and scalable platform. They also share their architecture and best practices for getting any computationally-intensive application started quickly.
Urs Köster - Convolutional and Recurrent Neural NetworksIntel Nervana
Speaker: Urs Köster, PhD
Urs will join us to dive deep into the field of Deep Learning and focus on Convolutional and Recurrent Neural Networks. The talk will be followed by a workshop highlighting neon™, an open source python based deep learning framework that has been built from the ground up for speed and ease of use.
Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks
The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.
Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:
* optimizing cluster setup;
* configuring the cluster;
* ingesting data; and
* monitoring long-running jobs.
We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.
Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.
Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:
* optimizing cluster setup;
* configuring the cluster;
* ingesting data; and
* monitoring long-running jobs.
We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.
Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. This webinar, based on the experience gained in assisting customers with the Databricks Virtual Analytics Platform, will present some best practices for building deep learning pipelines with Spark.
Rather than comparing deep learning systems or specific optimizations, this webinar will focus on issues that are common to deep learning frameworks when running on a Spark cluster, including:
* optimizing cluster setup;
* configuring the cluster;
* ingesting data; and
* monitoring long-running jobs.
We will demonstrate the techniques we cover using Google’s popular TensorFlow library. More specifically, we will cover typical issues users encounter when integrating deep learning libraries with Spark clusters.
Clusters can be configured to avoid task conflicts on GPUs and to allow using multiple GPUs per worker. Setting up pipelines for efficient data ingest improves job throughput, and monitoring facilitates both the work of configuration and the stability of deep learning jobs.
H2O World - H2O Deep Learning with Arno CandelSri Ambati
H2O World 2015
Tutorial scripts for R, Python are here:
https://github.com/h2oai/h2o-world-2015-training/tree/master/tutorials/deeplearning
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
The Science and Technology Facilities Council is a UK Research Council which funds research and provides large facilities to the UK Scientific Community. This includes running a Tier 1 site for the LHC computing project, the JASMIN Super Data Cluster and a number of other HPC and HTC facilities. The Scientific Computing Department at the Rutherford Appleton Laboratory has been developing a cloud for use across both sites of the Department and in the wider scientific community. This is an OpenNebula backed by Ceph block storage. I will give a brief background of the project, describe our set up, some use cases and the work we have done around OpenNebula (including a simplified web front-end and a number of hooks to provide us with traceability). I will also discuss how we are creating an elastic boundary between our HTC batch farm and cloud.
Author Biography
I am a Systems Administrator in the Scientific Computing Department of the UK’s Science and Technology Facilities Council. I work as part of the cloud team and I also work on a number of Grid services including our HTC batch farm for the LHC computing project.
Prior to my position here I worked in IT at a SMB focusing on Storage and Virtualisation, in particular Hyper-V and VMWare.
By accounting for deployment architecture we are able to fundamentally change the way that we build our application. We can simplify our permissions logic and institutional notions built into systems can be removed and replaced with replication and new URLs.
My 6th. revision of my Stackato presentation given at the German Perl Workshop 2013 in Berlin, Germany,
More information available at: https://logiclab.jira.com/wiki/display/OPEN/Stackato
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
2. • Summary
• Why Deep Learning is gaining popularity ?
• Introduction to Deep Learning
• Case-study of the state-of-the-art networks
• How to train them
• Tricks of the trade
• Overview of existing deep learning stack
Agenda
3. Summary
• 1300 applicants for 500 spots (industry + academia)
• Videos are online:
• Day 1: https://www.youtube.com/watch?v=eyovmAtoUx0
• Day 2: https://www.youtube.com/watch?v=9dXiAecyJrY
• Mostly high-quality talks from different areas
• Computer Vision (Karpathy – OpenAI), Speech (Coates - Baidu), NLP (Socher –
Salesforce, Quoc Le - Google), Unsupervised Learning (Salakhutdinov - CMU),
Reinforcement Learning (Schulman - OpenAI)
• Tools (TensorFlow/Theano/Torch)
• Overview/Vision talks (Ng, Bengio and Larochelle)
• Networking:
• Keras contributor (working in startup) – CNTK integration, potential for SystemML
integration
• TensorFlow users in Google
• Discussion on “dynamic operator placement” described in the whitepaper
5. • Efficacy of larger networks
Why Deep Learning is gaining popularity ?
Reference: Andrew Ng (Spark summit 2016).
6. • Efficacy of larger networks
Why Deep Learning is gaining popularity ?
Reference: Andrew Ng (Spark summit 2016).
Train large network on
large amount of data
Relative ordering not
defined for small data
7. • Efficacy of larger networks
• Large amount of data
Why Deep Learning is gaining popularity ?
Caltech101 dataset (by
FeiFei Li)
Google Street View
House Numbers (SVHN) Dataset
CIFAR-10 dataset
Flickr 30K Images
8. • Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
Why Deep Learning is gaining popularity ?
VGG: ~2-3 weeks training with 4 GPUs
ResNet 101: 2-3 weeks with 4 GPUs
Rocket
Fuel*
9. • Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Non-linearities, Batch normalization, Dropout, Ensembles
• Will discuss these in detail later
Why Deep Learning is gaining popularity ?
10. • Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
Why Deep Learning is gaining popularity ?
11. • Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
Why Deep Learning is gaining popularity ?
12. • Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
• Transfer Learning + Open-source (network, learned weights, dataset
as well as codebase)
• https://github.com/BVLC/caffe/wiki/Model-Zoo
• https://github.com/KaimingHe/deep-residual-networks
• https://github.com/facebook/fb.resnet.torch
• https://github.com/baidu-research/warp-ctc
• https://github.com/NervanaSystems/ModelZoo
Why Deep Learning is gaining popularity ?
13. • Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
• Transfer Learning + Open-source (network, learned weights, dataset
as well as codebase)
• Tooling support for rapid iterations/experimentation
• Auto-differentiation, general purpose optimizer (SGD variants)
• Layered architecture
• Tensorboard
Why Deep Learning is gaining popularity ?
14. • Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
• Transfer Learning + Open-source (network, learned weights, dataset
as well as codebase)
• Tooling support for rapid iterations/experimentation
• Auto-differentiation, general purpose optimizer (SGD variants)
• Layered architecture
• Tensorboard
Why Deep Learning is gaining popularity ?
Will skip RNN, LSTM,
CTC, Parameter
server, Unsupervised
and Reinforcement
Deep Learning
15. • DL for Speech (covers CTC + Speech pipeline):
• https://youtu.be/9dXiAecyJrY?t=3h49m40s
• https://github.com/baidu-research/ba-dls-deepspeech
• DL for NLP (covers word embeddings, RNN, LSTM, seq2seq)
• https://youtu.be/eyovmAtoUx0?t=3h51m45s (Richard Socher)
• https://youtu.be/9dXiAecyJrY?t=7h4m12s (Quoc Le)
• Deep Unsupervised Learning (covers RBM, Autoencoders, …):
• https://youtu.be/eyovmAtoUx0?t=7h7m54s
• Deep Reinforcement Learning (covers Q-learning, policy gradients):
• https://youtu.be/9dXiAecyJrY?t=7m43s
• Tutorial (TensorFlow, Torch, Theano)
• https://github.com/wolffg/tf-tutorial/
• https://github.com/alexbw/bayarea-dl-summerschool
• https://github.com/lamblin/bayareadlschool
Not covered in this talk
17. Different abstractions for Deep Learning
Deep Learning pipeline Deep Learning task
Eg: CNN + classifier
=> Image captioning,
Localization, …
Deep Neural Network
Eg: CNN, AlexNet,
GoogLeNet, …
Layer
Eg: Convolution,
Pooling, …
18. Common layers
• Fully connected layer
Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
19. Common layers
• Fully connected layer
• Convolution layer
• Less number of parameters as
compared to FC
• Useful to capture local
features (spatially)
• Output #channels = #filters
Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
20. Common layers
• Fully connected layer
• Convolution layer
• Pooling layer
• Useful to tolerate feature
deformation such as local shifts
• Output #channels = Input
#channels
Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
22. • Squashes the neuron’s pre-activations between [0, 1]
• Historically popular
• Disadvantages:
• Tends to vanish the gradient as activation increase (i.e. saturated neurons)
• Sigmoid outputs are not zero-centered
• exp() is a bit compute expensive
Sigmoid
Reference: Introduction to Feedforward Neural Networks - Larochelle.
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
23. • Squashes the neuron’s pre-activations between [-1, 1]
• Advantage:
• Zero-centered
• Disadvantages:
• Tends to vanish the gradient as activation increase
• exp() is compute expensive
Tanh
Reference: Introduction to Feedforward Neural Networks - Larochelle.
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
24. • Bounded below by 0 (always non-negative)
• Advantages:
• Does not saturate (in +region)
• Very computationally efficient
• Converges much faster than sigmoid/tanh in practice (e.g. 6x)
• Disadvantages:
• Tends to blowup the activations
• Alternatives:
• Leaky ReLU: max(0.001*a, a)
• Parameteric ReLU: max(alpha*a, a)
• Exponential ReLU: a if a>0; else alpha*(exp(a)-1)
ReLU (Rectified Linear Units)
Reference: Introduction to Feedforward Neural Networks - Larochelle.
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
max(0, a)
25. • According to Hinton, why did deep learning not catch on earlier ?
• Our labeled datasets were thousands of times too small.
• Our computers were millions of times too slow.
• We initialized the weights in a stupid way.
• We used the wrong type of non-linearity (i.e. sigmoid/tanh).
• Which non-linearity to use => ReLU according to
• LeCun: http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf
• Hinton: http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf
• Bengio: https://www.utc.fr/~bordesan/dokuwiki/_media/en/glorot10nipsworkshop.pdf
• If not satisfied with ReLU,
• Double-check the learning rates
• Then, try out Leaky ReLU / ELU
• Then, try out tanh but don’t expect much
• Don’t use sigmoid
Reference: Introduction to Feedforward Neural Networks - Larochelle.
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
26. Common layers
• Fully connected layer
• Convolution layer
• Pooling layer
• Activations
• SoftMax
• Strictly positive
• Sums to 1
• Used for multi-class classification
• Other losses: Hinge, Euclidean, Sigmoid cross-entropy, …
Reference: Introduction to Feedforward Neural Networks - Larochelle. https://dl.dropboxusercontent.com/u/19557502/hugo_dlss.pdf
27. Common layers
• Fully connected layer
• Convolution layer
• Pooling layer
• Activations
• SoftMax
• Dropout
• Idea: «cripple» neural network by removing hidden units stochastically
• Use random mask: Could use a different dropout probability, but 0.5 usually
works well
• Beats regular backpropagation on many datasets, but is slower (~2x)
• Helps to prevent overfitting
28. Common layers
• Normalization layers
• Batch Normalization (BN)
• Network converge faster if inputs are whitened, i.e. linearly transformed to have zero
mean and unit variance, and decorrelated
• Ioffe and Szegedy, 2014 suggested to also use normalization at the level of hidden level
• BN: normalizing each layer, for each mini-batch => addresses “internal covariate shift”
• Greatly accelerate training + Less sensitive to initialization + Improve regularization
Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
Two popular approaches:
- Subtract the mean image (e.g. AlexNet)
- Subtract per-channel mean (e.g. VGGNet)
29. Common layers
• Normalization layers
• Batch Normalization (BN)
• Network converge faster if inputs are whitened, i.e. linearly transformed to have zero
mean and unit variance, and decorrelated
• Ioffe and Szegedy, 2014 suggested to also use normalization at the level of hidden level
• BN: normalizing each layer, for each mini-batch => addresses “internal covariate shift”
• Greatly accelerate training + Less sensitive to initialization + Improve regularization
Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
30. Common layers
• Normalization layers
• Batch Normalization (BN)
• BN: normalizing each layer, for each mini-batch
• Greatly accelerate training + Less sensitive to initialization + Improve regularization
Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
Trained with initial learning
rate 0.0015
Same as Inception with BN
before each nonlinearity
Initial learning rate
increased by 5x (0.0075)
and 30x (0.045)
Same as N-x5, but with
Sigmoid instead of ReLU
31. Common layers
• Normalization layers
• Batch Normalization (BN)
• Local Response Normalization (LRN)
• Used in AlexNet paper with k=2, alpha=10e-4, beta=0.75, n=5
• Not common anymore
channel
Number of channels
32. Different abstractions for Deep Learning
Deep Learning pipeline Deep Learning task
Eg: CNN + classifier
=> Image captioning,
Localization, …
Deep Neural Network
Eg: CNN, AlexNet,
GoogLeNet, …
Layer
Eg: Convolution,
Pooling, …
34. Convolutional Neural networks
LeNet for OCR (90s)
AlexNet
Compared to LeCun 1998, AlexNet used:
•More data: 10^6 vs. 10^3
•GPU (~20x speedup) => Almost 1B FLOPs for single image
•Deeper: More layers (8 weight layers)
•Fancy regularization (dropout 0.5)
•Fancy non-linearity (first use of ReLU according to Karpathy)
•Accuracy on ImageNet (ILSVRC 2012 winner): 16.4%
•Using ensembles (7 CNN), accuracy 15.4%
35. Convolutional Neural networks
ZFNet [Zeiler and Fergus, 2013]
•It was an improvement on AlexNet by tweaking the architecture
hyperparameters,
• In particular by expanding the size of the middle convolutional layers
• CONV 3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512
• And making the stride and filter size on the first layer smaller.
• CONV 1: change from (11x11 stride 4) to (7x7 stride 2)
•Accuracy on ImageNet (ILSVRC 2013 winner): 16.4% -> 14.8%
Reference: http://cs231n.github.io/convolutional-networks/
36. Convolutional Neural networks
• Homogenous architecture
• All convolution layers use small 3x3 filters
(compared to AlexNet that uses 11x11, 5x5 and
3x3 filters) with stride 1 (compared to AlexNet
that uses 4 and 1 strides)
• Depth of network critical component (19 layers)
• Other details:
• 5 maxpool layers (x2 reduction)
• No normalization
• 3 FC layers (instead of 2) => Most number of
parameters (102760448, 16777216, 409600)
• ImageNet top 5 error (ILSVRC 2014 runner-up):
• 14.8% -> 7.3% (top 5 error)
Reference: https://arxiv.org/pdf/1509.07627.pdf, https://arxiv.org/pdf/1409.1556v6.pdf, https://www.youtube.com/watch?v=j1jIoHN3m0s
64 128 256 512 512
Number of filters
• Why 3x3 layers ?
• Stacked convolution layers have large receptive field
• two 3x3 => 5x5 receptive field
• three 3x3 layers => 7x7 receptive field
• More non-linearity
• Less parameters to learn
37. New Lego brick or mini-network
(Inception module)
For Inception v4, see https://arxiv.org/abs/1602.07261
38. Convolutional Neural networks
GoogLeNet [Szegedy et al., 2014]
- 9 inception modules
- ILSVRC 2014 winner
(6.7% top 5 error )
- Only 5 million params!
(Uses Avg pooling instead of FC layers)
39. Convolutional Neural networks
GoogLeNet VGG_model_A AlexNet
updateOutput 130.76 162.74 27.65
updateGradInput 197.86 167.05 24.32
accGradParameters 142.15 199.49 28.99
Forward 130.76 162.74 27.65
Backward 340.01 366.54 53.31
TOTAL 470.77 529.29 80.96
Speed with Torch7 (using GeForce GTX TITAN X and CuDNN) … all time in milliseconds
Compared to AlexNet, GoogLeNet has
- 12x less params
- 2x more compute
- 6.67% (vs. 16.4%)
Compared to VGGNet, GoogLeNet has
- 36x less params
- 22 layers (vs. 19)
- 6.67% (vs. 7.3%)
Reference: https://arxiv.org/pdf/1512.00567.pdf, https://github.com/soumith/convnet-benchmarks/blob/master/torch7/imagenet_winners/output.log
40. Analysis of errors on GoogLeNet vs
human on ImageNet dataset
• Types of error that both GoogLeNet human are susceptible to:
• Multiple objects (24% of GoogLeNet errors and 16% of human errors)
• Incorrect annotations
• Types of error that GoogLeNet is more susceptible to than human:
• Object small or thin (21% of GoogLeNet errors)
• Image filters, eg: distort contrast/color distribution (13% of GoogLeNet errors and only 1
human error)
• Abstract representations, eg: shadow on the ground, of a child on a swing (6% GoogleNet
errors)
• Types of error that human is more susceptible to than GoogLeNet:
• Fine-grained recognition, eg: species of dogs (7% of GoogLeNet errors and 37% of human
errors)
• Insufficient training data
Reference: http://arxiv.org/abs/1409.0575
42. New Lego brick (Residual block)
Reference: http://torch.ch/blog/2016/02/04/resnets.html
Shortcut to address underfitting
due to vanishing gradients
- Occurs even with batch
normalization
43. Convolutional Neural networks
• ResNet Architecture
• VGG style design => just deep
• All 3x3 convolution
• #Filter x2
• Other remarks:
• no max pooling (almost)
• no FC
• no dropout
• See https://github.com/facebook/fb.resnet.torch
Reference: http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf
44. Different abstractions for Deep Learning
Deep Learning pipeline Deep Learning task
Eg: CNN + classifier
=> Image captioning,
Localization, …
Deep Neural Network
Eg: CNN, AlexNet,
GoogLeNet, …
Layer
Eg: Convolution,
Pooling, …
45. Addressing other tasks …
Reference: https://docs.google.com/presentation/d/1Q1CmVVnjVJM_9CDk3B8Y6MWCavZOtiKmOLQ0XB7s9Vg/edit#slide=id.g17e6880c10_0_926
SKIP THIS !!
49. Training a Deep Neural Network
“Forward propagation”
Compute a function via composition of linear
transformations followed by element-wise non-linearities
“Backward propagation”
Propagates errors backwards and update weights according
to how much they contributed to the output
Reference: “You Should Be Using Automatic Differentiation” by Ryan Adams (Twitter)
Special case of “automatic
differentiation” discussed
in next slides
50. Training a Deep Neural Network
Training features:
Training label:
Goal: learn the weights
Define a loss function:
For numerical stability and mathematical simplicity, we use negative log-likelihood
(often referred to as cross-entropy):
51. • Using the loss function: , we learn weights by
Training a Deep Neural Network
• Learning is cast as optimization
• Popular algorithm: Stochastic Gradient Descent
• Needs to compute the gradients:
• And initialization of weights (covered later):
52. • Evaluate derivative of f(x) = sin(x – 3/x) at x = 0.01
• Symbolic differentiation
• Symbolically differentiate the function as an expression, and evaluate it at
the required point
• Low speed + difficult to convert DNN into expressions
• Symbolically, f’(x) = cos(x – 3/x)(1+ 3/x2
) … at x=0.01 => -962.8192798
• Numerical differentiation
• Use finite differences:
• Generally bad numerical stability
Methods for differentiating functions
Reference: http://homes.cs.washington.edu/~naveenks/files/2009_Cranfield_PPT.pdf
• Automatic/Algorithmic Differentiation (AD)
• Mechanically calculates derivatives as functions expressed as computer
programs, at machine precision, and with complexity guarantees - Barak
Pearlmutter
• Reverse-mode automatic differentiation used in practice
53. Examples of AD in practice
https://github.com/HIPS/autograd
For Python and NumPy:
See http://www.autodiff.org/ for more details
For Torch (developed by Twitter cortex):
https://github.com/twitter/torch-autograd/
54. • Convert the algorithm into sequence of assignment of basic
operations:
Reverse-mode AD (how it works)
https://justindomke.wordpress.com/2009/03/24/a-simple-explanation-of-reverse-mode-automatic-differentiation/
parents of
• Apply chain rule:
• Differentiate each basic operation f in the reverse order:
55. Reverse-mode AD (how it works – NN)
From Neural Network with Torch - Alex Wiltschko
56. Reverse-mode AD (how it works – NN)
From Neural Network with Torch - Alex Wiltschko
57. • Normalize your data
• Mini-batch instead of SGD (leverage matrix-matrix operations)
• Use momentum
• Use adaptive learning rates:
• Adagrad: learning rates are scaled by the square root of the cumulative sum
of squared gradients
• RMSProp: instead of cumulative sum, use exponential moving average
• Adam: essentially combines RMSProp with momentum
• Debug your gradient using finite difference method
Tricks of the Trade
58. • Use momentum
• Use adaptive learning rates:
• Adagrad: learning rates are scaled by the square root of the cumulative sum
of squared gradients
• RMSProp: instead of cumulative sum, use exponential moving average
• Adam: essentially combines RMSProp with momentum
Tricks of the Trade
59. • Initialization matters
• Assume 10-layer FC network with tanh non-linearity
Tricks of the Trade
- Initialize with zero mean & 0.01 std dev
- Does not work for deep networks
Layer Number
Layer mean Layer std dev
- Initialize with zero mean & unit std dev
- Almost all neurons completely saturated, either -1
and 1. Gradients will be all zero.
Layer Number
Layer mean Layer std dev
60. • Initialization matters
• Assume 10-layer FC network with tanh non-linearity
Tricks of the Trade
Xavier initialization [Glorot et al., 2010]:
Layer Number
Layer mean Layer std dev
- Use zero-mean and 1/fan_in variance
- Works well for tanh
- But not for ReLU
He al proposed replacing by
Note: additional /2
61. • Initialization matters
• Assume 10-layer FC network with tanh non-linearity
• Batch normalization reduces the strong dependence on initialization
Tricks of the Trade
63. Existing Deep Learning Stack
Caffee, Theano , Torch7, TensorFlow, DeepLearning4J, SystemML*
cuDNN Aparapi (converts bytecode to OpenCL)
~CPU’s BLAS/LAPACK: cuBLAS, MAGMA,
CULA, cuSPARSE, cuSOLVER, cuRAND, etc
CUDA (preferred if Nvidia GPUs) OpenCL (portable)
Framework:
Library with
commonly used
building blocks:
Driver/Toolkit:
Hardware
Multicore, Task parallelism,
Minimize latency (eg:
Unsafe/DirectBuf/GC
pauses/NIO)
Data parallelism (single task),
Cost of moving data from CPU to
GPU (Kernel fusion ?), Maximize
throughput.
Rule of Thumb: Always use libraries !!
Caffe (GPU) 11x but Caffe(cuDNN) 14x
on AlexNet training (5 convolution + 3
connected layers)
*Conditions apply: Unified
memory model since CUDA
6
64. Comparison of existing framework
Core
Lang
Bindings CPU Single
GPU
Multi
GPU
Distributed Comments
Caffe C++ Python,
MatLab
Yes Yes Yes See
com.yahoo.ml.
CaffeOnSpark
Mostly for image classification,
Models/Layers expressed in proto
format
Theano /
PyLearn2
Python Yes Yes In
Progress
No Transparent use of GPU, Auto-diff,
General purpose, Computation as
DAG.
Torch7 Lua Yes Yes Yes See Twitter’s
torch-distlearn
CTC impl on Torch7 of Baidu’s Deep
Speech opensourced. Very efficient.
TensorFlow C++ Python Yes Yes Upto 4
GPUs
Not open-
sourced
Slower than Theano/Torch,
TensorBoard useful, Computation as
DAG
DL4J Java Yes Yes Most likely Yes Supports GPUs via CUDA, Support
for Hadoop/Spark
SystemML Java Python,
Scala
Yes In
Progress
Not yet Yes
Minerva/CXXN
et (Smola)
C++ Python Yes Yes Yes Yes https://github.com/dmlc. Minerva ~
Theano and CXXNet ~ Caffe