SlideShare a Scribd company logo
1 of 17
Highway Networks
Rupesh Kumar Srivastava
Klaus Greff
Jurgen Schmidhuber
Prepared by Adarsha Dhakal
Introduction
Depth of neural network is crucial for its success. However, network training becomes more
difficult with increasing depth.
New architecture designed to ease gradient-based training of very deep network is highway
network.
Highway networks with hundreds of layers can be trained directly using stochastic gradient
descent.
It uses skip connections modulated by learned gating mechanisms to regulate information flow,
inspired by Long short term memory (LSTM) recurrent neural network.
Highway Networks have been used as part of text sequence labelling and speech recognition
task.
Gradient Descent
Commonly used iterative optimization algorithms
of machine learning to train the machine learning
and deep learning models. It helps in finding the
local minimum of a function.
The main objective of using a gradient descent
algorithm is to minimize the cost function using
iteration.
Loss/Cost Function
Function that compares the target
and predicted output values.
Measures how well the neural
network models the training data.
When training, we aim to minimize
this loss between the predicted and
target outputs.
➔ SGD is great when we have
tons of data and a lot of
parameters.
➔ In these situations, regular
GD may not be
computationally feasible.
LSTM Recurrent Neural Network
Standard Recurrent Neural Networks (RNNs) suffer from short-term
memory due to a vanishing gradient problem that emerges when
working with longer data sequences.
Luckily, we have more advanced versions of RNNs that can preserve
important information from earlier parts of the sequence and carry it
forward.
The two best-known versions are Long Short-Term Memory (LSTM)
and Gated Recurring Unit (GRU).
LSTM vs GRU
➔ GRU has two gates that are
reset and update while LSTM
has three gates that are input,
output and forget.
➔ GRU is less complex than
LSTM because it has less
number of gates. If the dataset
is small then GRU is preferred
otherwise LSTM for the larger
dataset.
Highway network
vs plain networks
➔ HN is virtually independent
of depth while other
suffers significantly.
➔ SGD stalls at beginning in
plain networks unless a
specific weight is initialized
Model
The model has two gates in addition to the y = H(WH, x) gate:
The transform gate T(WT, x)
The carry gate C(WC, x)
Those two last gates are non-linear transfer functions (Sigmoid
function).
The H(WH, x) function can be any desired transfer function.
The carry gate is defined as:
C(WC, x) = 1 - T(WT, x)
While the transform gate is just a gate with a sigmoid transfer function.
Structure
The structure of a hidden layer follows the equation:
Structure cont.
Depending on the output of transform gates, a highway layer can
smoothly vary its behavior between that of a plain layer and a layer
which simply passes its inputs through.
Conclusion
Training very deep networks is difficult without increasing
total network size.
Highway networks are novel NN architectures which
enable the training of extremely deep networks using
simple SGD.
Optimization of highway network is not hampered even as
network depth increases to a hundred layers.
Conclusion Cont.
Ability to train extremely deep networks opens up the
possibility of studying impact of depth on complex
problems without restrictions.
Various activation functions can be used in deep highway
networks.
Thank you!
References:
https://en.wikipedia.org/wiki/Highway_ne
twork
https://arxiv.org/abs/1910.09890
https://towardsdatascience.com/lstm-
recurrent-neural-networks-how-to-teach-
a-network-to-remember-the-past-
55e54c2ff22e

More Related Content

Similar to Highway Networks

Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...
IJECEIAES
 
An Enhanced DSR Protocol for Improving QoS in MANET
An Enhanced DSR Protocol for Improving QoS in MANETAn Enhanced DSR Protocol for Improving QoS in MANET
An Enhanced DSR Protocol for Improving QoS in MANET
KhushbooGupta145
 
An Adaptive Routing Algorithm for Communication Networks using Back Pressure...
An Adaptive Routing Algorithm for Communication Networks  using Back Pressure...An Adaptive Routing Algorithm for Communication Networks  using Back Pressure...
An Adaptive Routing Algorithm for Communication Networks using Back Pressure...
IJMER
 

Similar to Highway Networks (20)

IRJET- Survey on Adaptive Routing Algorithms
IRJET- Survey on Adaptive Routing AlgorithmsIRJET- Survey on Adaptive Routing Algorithms
IRJET- Survey on Adaptive Routing Algorithms
 
Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...
 
A distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theA distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase the
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Flex ch
Flex chFlex ch
Flex ch
 
Mitigating Link Failures & Implementing Security Mechanism in Multipath Flows...
Mitigating Link Failures & Implementing Security Mechanism in Multipath Flows...Mitigating Link Failures & Implementing Security Mechanism in Multipath Flows...
Mitigating Link Failures & Implementing Security Mechanism in Multipath Flows...
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers across
 
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
 
An Enhanced DSR Protocol for Improving QoS in MANET
An Enhanced DSR Protocol for Improving QoS in MANETAn Enhanced DSR Protocol for Improving QoS in MANET
An Enhanced DSR Protocol for Improving QoS in MANET
 
Implementation of Spanning Tree Protocol using ns-3
Implementation of Spanning Tree Protocol using ns-3Implementation of Spanning Tree Protocol using ns-3
Implementation of Spanning Tree Protocol using ns-3
 
Network Layer
Network LayerNetwork Layer
Network Layer
 
Network Layer
Network LayerNetwork Layer
Network Layer
 
compiler design
compiler designcompiler design
compiler design
 
An Adaptive Routing Algorithm for Communication Networks using Back Pressure...
An Adaptive Routing Algorithm for Communication Networks  using Back Pressure...An Adaptive Routing Algorithm for Communication Networks  using Back Pressure...
An Adaptive Routing Algorithm for Communication Networks using Back Pressure...
 
B031201016019
B031201016019B031201016019
B031201016019
 
From Simulation to Online Gaming: the need for adaptive solutions
From Simulation to Online Gaming: the need for adaptive solutions From Simulation to Online Gaming: the need for adaptive solutions
From Simulation to Online Gaming: the need for adaptive solutions
 
Distributed Path Computation Using DIV Algorithm
Distributed Path Computation Using DIV AlgorithmDistributed Path Computation Using DIV Algorithm
Distributed Path Computation Using DIV Algorithm
 
C0431320
C0431320C0431320
C0431320
 

More from AdarshaDhakal (6)

cloud_ch1.pptx
cloud_ch1.pptxcloud_ch1.pptx
cloud_ch1.pptx
 
Concealed Object Recognition
Concealed Object RecognitionConcealed Object Recognition
Concealed Object Recognition
 
MapReduce Programming Model
MapReduce Programming ModelMapReduce Programming Model
MapReduce Programming Model
 
An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...
 
Concept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge ElicitationConcept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge Elicitation
 
Shape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic SplineShape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic Spline
 

Recently uploaded

Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Marc Lester
 

Recently uploaded (20)

Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmux
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test Automation
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdf
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Encryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key ConceptsEncryption Recap: A Refresher on Key Concepts
Encryption Recap: A Refresher on Key Concepts
 

Highway Networks

  • 1. Highway Networks Rupesh Kumar Srivastava Klaus Greff Jurgen Schmidhuber Prepared by Adarsha Dhakal
  • 2. Introduction Depth of neural network is crucial for its success. However, network training becomes more difficult with increasing depth. New architecture designed to ease gradient-based training of very deep network is highway network. Highway networks with hundreds of layers can be trained directly using stochastic gradient descent. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by Long short term memory (LSTM) recurrent neural network. Highway Networks have been used as part of text sequence labelling and speech recognition task.
  • 3. Gradient Descent Commonly used iterative optimization algorithms of machine learning to train the machine learning and deep learning models. It helps in finding the local minimum of a function. The main objective of using a gradient descent algorithm is to minimize the cost function using iteration.
  • 4. Loss/Cost Function Function that compares the target and predicted output values. Measures how well the neural network models the training data. When training, we aim to minimize this loss between the predicted and target outputs.
  • 5.
  • 6.
  • 7.
  • 8. ➔ SGD is great when we have tons of data and a lot of parameters. ➔ In these situations, regular GD may not be computationally feasible.
  • 9. LSTM Recurrent Neural Network Standard Recurrent Neural Networks (RNNs) suffer from short-term memory due to a vanishing gradient problem that emerges when working with longer data sequences. Luckily, we have more advanced versions of RNNs that can preserve important information from earlier parts of the sequence and carry it forward. The two best-known versions are Long Short-Term Memory (LSTM) and Gated Recurring Unit (GRU).
  • 10. LSTM vs GRU ➔ GRU has two gates that are reset and update while LSTM has three gates that are input, output and forget. ➔ GRU is less complex than LSTM because it has less number of gates. If the dataset is small then GRU is preferred otherwise LSTM for the larger dataset.
  • 11. Highway network vs plain networks ➔ HN is virtually independent of depth while other suffers significantly. ➔ SGD stalls at beginning in plain networks unless a specific weight is initialized
  • 12. Model The model has two gates in addition to the y = H(WH, x) gate: The transform gate T(WT, x) The carry gate C(WC, x) Those two last gates are non-linear transfer functions (Sigmoid function). The H(WH, x) function can be any desired transfer function. The carry gate is defined as: C(WC, x) = 1 - T(WT, x) While the transform gate is just a gate with a sigmoid transfer function.
  • 13. Structure The structure of a hidden layer follows the equation:
  • 14. Structure cont. Depending on the output of transform gates, a highway layer can smoothly vary its behavior between that of a plain layer and a layer which simply passes its inputs through.
  • 15. Conclusion Training very deep networks is difficult without increasing total network size. Highway networks are novel NN architectures which enable the training of extremely deep networks using simple SGD. Optimization of highway network is not hampered even as network depth increases to a hundred layers.
  • 16. Conclusion Cont. Ability to train extremely deep networks opens up the possibility of studying impact of depth on complex problems without restrictions. Various activation functions can be used in deep highway networks.