This document provides a summary of topics covered in a deep neural networks tutorial, including:
- A brief introduction to artificial intelligence, machine learning, and artificial neural networks.
- An overview of common deep neural network architectures like convolutional neural networks, recurrent neural networks, autoencoders, and their applications in areas like computer vision and natural language processing.
- Advanced techniques for training deep neural networks like greedy layer-wise training, regularization methods like dropout, and unsupervised pre-training.
- Applications of deep learning beyond traditional discriminative models, including image synthesis, style transfer, and generative adversarial networks.
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
A brief survey of current deep learning/neural network methods currently used in NLP: recurrent networks (LSTM, GRU), recursive networks, convolutional networks, hybrid architectures, attention models. We will look at specific papers in the literature, targeting sentiment analysis, text classification and other tasks.
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
A tutorial given at NAACL HLT 2013.
Richard Socher and Christopher Manning
http://nlp.stanford.edu/courses/NAACL2013/
Machine learning is everywhere in today's NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. Despite these advantages, many researchers in NLP are not familiar with these methods. Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled "magic here". The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. In this section applications include language modeling and POS tagging. In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. We cover both equations as well as applications. We show how training can be achieved by a modified version of the backpropagation algorithm introduced before. These modifications allow the algorithm to work on tree structures. Applications include sentiment analysis and paraphrase detection. We also draw connections to recent work in semantic compositionality in vector spaces. The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentence- and document-level tasks. The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization.
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
A brief survey of current deep learning/neural network methods currently used in NLP: recurrent networks (LSTM, GRU), recursive networks, convolutional networks, hybrid architectures, attention models. We will look at specific papers in the literature, targeting sentiment analysis, text classification and other tasks.
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
A tutorial given at NAACL HLT 2013.
Richard Socher and Christopher Manning
http://nlp.stanford.edu/courses/NAACL2013/
Machine learning is everywhere in today's NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. Despite these advantages, many researchers in NLP are not familiar with these methods. Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled "magic here". The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. In this section applications include language modeling and POS tagging. In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. We cover both equations as well as applications. We show how training can be achieved by a modified version of the backpropagation algorithm introduced before. These modifications allow the algorithm to work on tree structures. Applications include sentiment analysis and paraphrase detection. We also draw connections to recent work in semantic compositionality in vector spaces. The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentence- and document-level tasks. The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization.
Word Embeddings, Application of Sequence modelling, Recurrent neural network , drawback of recurrent neural networks, gated recurrent unit, long short term memory unit, Attention Mechanism
Natural language processing techniques transition from machine learning to de...Divya Gera
Natural Language processing, its need, business applications, NLP with machine learning, Text data preprocessing for machine learning, NLP with Deep Learning.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Alberto Massidda - Images and words: mechanics of automated captioning with n...Codemotion
Image captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. Like in the notorious “finger pointing to the moon”, automated image captioning requires the ability to discern what it’s really going on in a scene and generate a fluent description for the act taking place. In this talk we present the underlying mechanics to the object detection and language generation using Convolutional and Recurrent Neural Networks.
Learning to understand phrases by embedding the dictionaryRoelof Pieters
review of "Learning to Understand Phrases by Embedding the Dictionary" by Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio
at KTH's Deep Learning reading group:
www.csc.kth.se/cvap/cvg/rg/
Word Embeddings, Application of Sequence modelling, Recurrent neural network , drawback of recurrent neural networks, gated recurrent unit, long short term memory unit, Attention Mechanism
Natural language processing techniques transition from machine learning to de...Divya Gera
Natural Language processing, its need, business applications, NLP with machine learning, Text data preprocessing for machine learning, NLP with Deep Learning.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Alberto Massidda - Images and words: mechanics of automated captioning with n...Codemotion
Image captioning is the process of generating textual description of an image. It uses both Natural Language Processing and Computer Vision to generate the captions. Like in the notorious “finger pointing to the moon”, automated image captioning requires the ability to discern what it’s really going on in a scene and generate a fluent description for the act taking place. In this talk we present the underlying mechanics to the object detection and language generation using Convolutional and Recurrent Neural Networks.
Learning to understand phrases by embedding the dictionaryRoelof Pieters
review of "Learning to Understand Phrases by Embedding the Dictionary" by Felix Hill, Kyunghyun Cho, Anna Korhonen, Yoshua Bengio
at KTH's Deep Learning reading group:
www.csc.kth.se/cvap/cvg/rg/
Deep learning algorithms have drawn the attention of researchers working in the field of computer vision, speech
recognition, malware detection, pattern recognition and natural language processing. In this paper, we present an overview of
deep learning techniques like Convolutional neural network, deep belief network, Autoencoder, Restricted Boltzmann machine
and recurrent neural network. With this, current work of deep learning algorithms on malware detection is shown with the
help of literature survey. Suggestions for future research are given with full justification. We also showed the experimental
analysis in order to show the importance of deep learning techniques.
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
This covers a end-to-end coverage of neural networks,CNN internals , Tensorflow and Keras basic , intution on object detection and face recognition and AI on Android x86.
Machine learning in science and industry — day 4arogozhnikov
- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
Object detection and recognition are important problems in computer vision and pattern recognition
domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on
computer based systems has proved to be a non-trivial task. In particular, despite significant research
efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition
systems in real time remain elusive. Here we present a survey of one particular approach that has proved
very promising for invariant feature recognition and which is a key initial stage of multi-stage network
architecture methods for the high level task of object recognition.
UNSUPERVISED LEARNING MODELS OF INVARIANT FEATURES IN IMAGES: RECENT DEVELOPM...ijscai
Object detection and recognition are important problems in computer vision and pattern recognition
domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on
computer based systems has proved to be a non-trivial task. In particular, despite significant research
efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition
systems in real time remain elusive. Here we present a survey of one particular approach that has proved
very promising for invariant feature recognition and which is a key initial stage of multi-stage network
architecture methods for the high level task of object recognition.
Unsupervised learning models of invariant features in images: Recent developm...IJSCAI Journal
Object detection and recognition are important problems in computer vision and pattern recognition
domain. Human beings are able to detect and classify objects effortlessly but replication of this ability on
computer based systems has proved to be a non-trivial task. In particular, despite significant research
efforts focused on meta-heuristic object detection and recognition, robust and reliable object recognition
systems in real time remain elusive. Here we present a survey of one particular approach that has proved
very promising for invariant feature recognition and which is a key initial stage of multi-stage network
architecture methods for the high level task of object recognition.
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
UNIT I INTRODUCTION
Neural Networks-Application Scope of Neural Networks-Artificial Neural Network: An IntroductionEvolution of Neural Networks-Basic Models of Artificial Neural Network- Important Terminologies of
ANNs-Supervised Learning Network.
Deep learning is now making the Artificial Intelligence near to Human. Machine Learning and Deep Artificial Neural Network make the copy of Human Brain. The success is due to large storage, computation with efficient algorithms to handle more behavioral and cognitive problem
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
In the last couple of years, deep learning techniques have transformed the world of artificial intelligence. One by one, the abilities and techniques that humans once imagined were uniquely our own have begun to fall to the onslaught of ever more powerful machines. Deep neural networks are now better than humans at tasks such as face recognition and object recognition. They’ve mastered the ancient game of Go and thrashed the best human players. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new hype? How is Deep Learning different from previous approaches? Let’s look behind the curtain and unravel the reality. This talk will introduce the core concept of deep learning, explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why “deep learning is probably one of the most exciting things that is happening in the computer industry“ (Jen-Hsun Huang – CEO NVIDIA).
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
1. Deep Neural Networks Tutorial
Andrey Filchenkov
Computer Technology Chair
Computer Technologies Lab
ITMO University
afilchenkov@corp.ifmo.ru
AINL FRUCT’ 6. No , 6, “t. Peters urg, Russia
2. Tutorial topics
Very brief introduction to AI, ML and ANN
What is ANN and how to learn it
DNN and standard DNN architectures
Beyond discriminative models
2 / 64
3. Next topic
Very brief introduction to AI, ML and ANN
What is ANN and how to learn it
DNN and standard DNN architectures
Beyond discriminative models
3 / 64
4. Artificial intelligence
Strong AI (A general I): functionality is similar to the human
brain or better.
Weak AI: good in solving certain well-formulated tasks.
Machine learning is a part of Weak AI
Many people have been thinking that artificial neural
networks are a path to Strong AI
Many people are thinking that deep learning networks are a
path to Strong AI
4 / 64
5. Neural networks as a machine learning algorithm
Neural paradigm is not only about machine learning
(computer architecture, computations, etc.)
Machine learning is about creating algorithms which can learn
patterns, regularities and rules from given data.
The biggest part of machine learning is supervised learning:
we are given a set of objects, each has a label, and we want to
learn how to find these label for objects we have never seen.
5 / 64
7. Brief early history of artificial neural networks
1943 Artificial neuron by McCulloch and Pitts
1949 Neuron learning rule by Hebb
1957 Perceptron by Rosenblatt
1960 Perceptron learning rule by Widrow and Hoff
1968 Group Method of Data Handling to learn multilayered
networks by Ivakhnenko
1969 Perceptrons by Minski and Papert
1974 Back propagation algorithm by Webros and by Galushkin
7 / 64
8. Brief modern history of ANN
1980 Convolutional NN by Fukushima
1982 Recurrent NN by Hopfield
1991 Va ishi g gradient pro le was identified by Hochreiter
1997 Long short term memory network by Hochreiter and
Schmidhuber
1998 Gradient descent for convolutional NN by LeCun et al.
2006 Deep model by Hinton, Osindero and Teh
2012 DNN started to become mainstream in ML and AI
8 / 64
9. Next topic
Very brief introduction to AI, ML and ANN
What is ANN and how to learn it
DNN and standard DNN architectures
Beyond discriminative models
9 / 64
10. Two sources of knowledge
Experts
we need to ask wisely and process
Data
we need to process and apply machine learning algorithms
How do we obtain knowledge?
10 / 64
11. Algorithms, performance of which grows with experience
The most popular task is prediction
Algorithms require data and labels (for predicting)
Learning of these algorithms is to minimize error rate in
prediction or maximize similar to the known answer
Machine learning
11 / 64
12. Each object is represented
as a feature vector. Each
object thus is a point in a
multidimensional space.
Vector representation of objects
12 / 64
16. Next topic
Very brief introduction to AI, ML and ANN
What is ANN and how to learn it
DNN and standard DNN architectures
• Deep learning introduction and best practices
• Deep Boltzmann Machines (DBM) and Deep Belief Network
(DBN)
• Convolution Neural Network (CNN)
• Autoencoders
• Recurrental Neural Network (RNN) and Long-Short Term
Memory (LSTM)
Beyond discriminative models
16 / 64
17. Deep architecture
Definition: Deep architectures are composed of multiple levels of
non-linear operations, such as neural nets with many hidden
layers
Most machine learning algorithms have shallow (1–3 layers)
architecture (SVM, PCA, kNN, Logistic Regression, etc.)
Goal: Deep learning methods aim at:
Learning feature hierarchies, no more feature engineering!
Where features from higher levels of the hierarchy are formed
by lower level features.
17 / 64
18. Why to go deep?
Some functions cannot be efficiently represented (in terms of
number of tunable elements) by architectures that are too shallow
Functions that can be compactly represented by a depth k
architecture might require an exponential number of computational
elements to be represented by a depth k− ar hite ture
Deep Representations might allow non-local generalization and
comprehensibility
Deep learning gets state of the art results in many fields (vision,
audio, NLP, etc.)!
18 / 64
19. DNN best practices: ReLU, PReLU
ReLU
19 / 64
PReLU
Sigmoid, hyperbolic tangent activation functions have a problem
with vanishing of gradients and tend to overfitting
20. DNN best practices: Data augmentation
The easiest and most common method to reduce overfitting on
image data is to artificially enlarge the dataset using label-
preserving transformations.
Types of data augmentation:
Image translation
Horizontal/vertical reflections + cropping
Changing RGB intensities
20 / 64
21. DNN best practices: Dropout
Dropout: set the output of each hidden neuron to zero w.p. 0.5
The euro s hi h are dropped out i this a do ot o tri ute
to the forward pass and do not participate in backpropagation
So every time an input is presented, the neural network samples a
different architecture, but all these architectures share weights
This technique reduces complex co-adaptations of neurons, since a
neuron cannot rely on the presence of particular other neurons
It is, therefore, forced to learn more robust features that are useful
in conjunction with many different random subsets of the other
neurons
Without dropout, a network exhibits substantial overfitting
Dropout roughly doubles the number of iterations required to
converge
21 / 64
22. Greedy Layer-Wise Training (1/2)
1. Train first layer using your data without the labels (unsupervised)
• Since there are no targets at this level, labels don't help. Could
also use the more abundant unlabeled data which is not part of
the training set (i.e. self-taught learning).
2. Then freeze the first layer parameters and start training the second
layer using the output of the first layer as the unsupervised input
to the second layer
3. Repeat this for as many layers as desired
• This builds our set of robust features
4. Use the outputs of the final layer as inputs to a supervised
layer/model and train the last supervised layer(s) (leave early
weights frozen)
5. Unfreeze all weights and fine tune the full network by training with
a supervised approach, given the pre-processed weight settings
22 / 64
23. Greedy Layer-Wise Training (2/2)
Greedy layer-wise training avoids many of the problems of trying to
train a deep net in a supervised fashion:
• Each layer gets full learning focus in its turn since it is the only
current "top" layer
• Can take advantage of unlabeled data
• When you finally tune the entire network with supervised training
the network weights have already been adjusted so that you are in
a good error basin and just need fine tuning. This helps with
problems of:
• Ineffective early layer learning
• Deep network local minima
23 / 64
24. Restricted Boltzmann machine
Two types of nodes: hidden and visible.
We are minimizing system energy which is not to converge by
updating its weights with propagating new objects. Probability
distribution on visible and hidden layers is Gibbs distribution.
24 / 64
25. Deep Belief Network
25 / 64
First train unsupervisedly with several
(two) levels of RBM (or autoencoders).
Then train next layers supervisedly and
consecutively.
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313:504-507.
27. Convolution Neural Network (CNN)
Core concepts:
Local perception – each neuron sees a small part of the
object. Use kernels (filters) to capture 1-D or 2-D structure of
objects. For instance, capture all pixel neighbors for an image.
Weight sharing – use small and the same sets of kernels for all
objects, this leads to reduction of number of adjusting
parameters in comparison with MLP
Subsampling/pooling – use dimensionality reduction for
images in order to provide invariance to scale
27 / 64
31. How do trained kernels look like?
31 / 64
low feature medium feature high feature
Each kernel composes a local patch of lower-level features
into high level representation
32. Levels of abstraction
Hierarchical Learning:
Natural progression from low
level to high level structure
as seen in natural complexity
Easier to monitor what is
being learnt and to guide the
machine to better subspaces
A good lower level
representation can be used
for many distinct tasks
32 / 64
33. LeNet
33 / 64
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the
IEEE 86.11 (1998): 2278-2324.
34. GoogLeNet
34 / 64
Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint
arXiv:1409.4842 (2014).
39. Autoencoders (1/3)
Autoencoder: a feed-forward neural
network trained to reproduce its input
at the output layer
Do non-linear dimensionality
reduction
Train via backpropagation
1-layer autoencoder gets similar
results as PCA
39 / 64
42. Autoencoders in bioinformatics
42 / 64
Fakoor, Rasool, et al. "Using deep learning to enhance cancer diagnosis and classification." Proceedings of the International
Conference on Machine Learning. 2013.
43. Deep autoencoders: document processing
We can use an autoencoder to find low-dimensional codes for
documents that allow fast and accurate retrieval of similar
documents from a large set.
We start with o erti g ea h do u e t i to a ag of
ords . This is a 2000 dimensional vector that contains the
counts for each of the 2000 commonest words.
43 / 64
44. Deep autoencoders: document retrieval
We train the neural network to
reproduce its input vector as
its output
This forces it to compress as
much information as possible
into the 10 numbers in the
central bottleneck.
These 10 numbers are then a
good way to compare
documents.
44 / 64
2000 reconstructed counts
500 neurons
250 neurons
10
250 neurons
500 neurons
2000 word counts
45. Deep autoencoders: document visualization (1/2)
Instead of using codes to
retrieve documents, we can
use 2-D codes to visualize sets
of documents.
This works much better than
2-D PCA
45 / 64
2000 reconstructed counts
500 neurons
250 neurons
2
250 neurons
500 neurons
2000 word counts
49. Long Short-Term Memory (LSTM)
LSTM: a special case of RNN capable of learning long-term
dependencies
There are four neural network layers in repeating module
49 / 64Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
50. LSTM: Cell state
Cell state: runs straight down the entire chain, with only
some minor linear interactions
LSTM does have the ability to remove or add information
to the cell state, carefully regulated by structures called
gates. Gates are a way to optionally let information
through.
The sigmoid layer outputs numbers
between zero and one, describing
how much of each component
should be let through. LSTM has
3 gates.
50 / 64
51. LSTM: Forget gate layer
It looks at ℎt−1 and �t, and outputs a number between 0 and
1 for each number in the cell state �t−1. 1 represents
o pletel keep this hile 0 represe ts o pletel get rid
of this .
51 / 64
52. LSTM: Input gate layer (1/2)
How to decide what new information to store in the cell state?
First, a sig oid la er alled the i put gate la er de ides
which values should be updated.
Next, a tanh layer creates a vector of new candidate values, �t,
that ould e added to the state. I the e t step, e’ll
combine these two to create an update to the state
52 / 64
53. LSTM: Input gate layer (2/2)
It’s o ti e to update the old ell state, ��−1, into the new
cell state ��. The previous steps already decided what to do,
we just need to actually do it
We multiply the old state by ��, forgetting the things we
decided to forget earlier. Then we add �� ⋅ ��−1. This is the
new candidate values, scaled by how much we decided to
update each state value
53 / 64
54. LSTM: Output gate layer
The output will be based on cell state, but will be a filtered
version. First, we run a sigmoid layer which decides what
parts of the ell state e’re goi g to output. The , e put the
cell state through tanh to push the alues to e et ee −
and 1) and multiply it by the output of the sigmoid gate, so
that we only output the parts we decided to.
54 / 64
56. Deep learning analysis: advantages
Extremely strong model, which can potentially solve any
problem of machine learning.
Already learnt model can be reused: multi-task support
Many good models are already known which are state-of-the-
art for many tasks:
• image recognition;
• speech recognition;
• natural language processing;
• etc.
56 / 64
57. Deep learning analysis: disadvantages
The deeper the net is
• the more data you need;
• the more time you need;
• the stronger processors you need.
Usually no intuition how it works exactly;
Usually you work with DNN as a black box;
Prone to overfitting: regularization must be used.
57 / 64
58. Next topic
Very brief introduction to AI, ML and ANN
What is ANN and how to learn it
DNN and standard DNN architectures
Beyond discriminative models
58 / 64
59. Reverse the network and make it predict images given labels
Image synthesis
Dosovitskiy, A., Tobias Springenberg, J., & Brox, T. (2015). Learning to generate chairs with convolutional neural networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 1538-1546). 59 / 64
60. Keep i er represe tatio of a i age
(Gram matrix �� for convolutional layers)
Then create a new random network and
learn it no have similar inner
representation as the one we have kept.
Texture synthesis
Gatys, L., Ecker, A. S., & Bethge, M. (2015). Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing
Systems (pp. 262-270). 60 / 64
61. Style = texture.
Image = content and
is represented with
the last
convolutional layer.
We will learn an
image that is similar
both to image and
content.
Style transmission
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576.
61 / 64
62. DeetArt was created in 2015:
https://deepart.io/
They implemented the algorithm
described before.
DeepArt and Prisma
62 / 64
Prisma was created in June,
2016.
They made it optimized,
mobiles and with preselected
filters (instead of styles)
63. Materials
Presentation was prepared using:
1. http://avisingh599.github.io/deeplearning/visual-qa/
2. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
3. https://class.coursera.org/ml-003/lecture
4. K. Vorontsov Ma hi e lear i g ourse i Russia
63 / 64