This document discusses the history and recent developments in artificial intelligence and deep learning. It covers early work in neural networks from the 1950s through the 1990s, including perceptrons, autoencoders, and connectionism. More recent progress is attributed to greater computing power, larger datasets, and the development of automatic differentiation techniques. Applications discussed include computer vision, natural language processing using word embeddings, and recurrent neural networks for tasks like handwriting generation.
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
Professor Steve Roberts, Machine learning research group and Oxford-Man Institute + Alan Turing Institute. Steve gave this talk on the 24th January at the London Bayes Nets meetup.
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
The document discusses a paper that argues traditional theories of generalization may not fully explain why large neural networks generalize well in practice. It summarizes the paper's key points:
1) The paper shows neural networks can easily fit random labels, calling into question traditional measures of complexity.
2) Regularization helps but is not the fundamental reason for generalization. Neural networks have sufficient capacity to memorize data.
3) Implicit biases in algorithms like SGD may better explain generalization by driving solutions toward minimum norm.
4) The paper suggests rethinking generalization as the effective capacity of neural networks may differ from theoretical measures. Understanding finite sample expressivity is important.
This paper proposes AmbientGAN, which trains a generative adversarial network using partial or noisy observations rather than fully observed samples. AmbientGAN trains the discriminator on the measurement domain rather than the raw data domain, allowing the generator to be trained without needing large amounts of good training data. The paper proves it is theoretically possible to recover the original data distribution even when the measurement process is not invertible. It presents experimental results showing AmbientGAN can generate high quality samples and recover the underlying data distribution from various types of lossy and noisy measurements.
Introduction to Interpretable Machine LearningNguyen Giang
This document discusses interpretable machine learning and explainable AI. It begins with definitions of key terms and an overview of interpretable methods. Deep learning models are often treated as "black boxes" that are difficult to interpret. Interpretability can be achieved by using inherently interpretable models like linear models or decision trees, adding attention mechanisms, or interpreting models before, during or after building them. Later sections discuss specific interpretable techniques like understanding data through examples, MMD-Critic for learning prototypes and criticisms, and visualizing convolutional neural networks to understand predictions. The document emphasizes the importance of interpretability and explains several approaches to make machine learning models more transparent to humans.
Picked-up lists of GAN variants which provided insights to the community. (GANs-Improved GANs-DCGAN-Unrolled GAN-InfoGAN-f-GAN-EBGAN-WGAN)
After short introduction to GANs, we look through the remaining difficulties of standard GANs and their temporary solutions (Improved GANs). By following the slides, we can see the other solutions which tried to resolve the problems in various ways, e.g. careful architecture selection (DCGAN), slight change in update (Unrolled GAN), additional constraint (InfoGAN), generalization of the loss function using various divergence (f-GAN), providing new framework of energy based model (EBGAN), another step of generalization of the loss function (WGAN).
Deep generative models can generate synthetic images, speech, text and other data types. There are three popular types: autoregressive models which generate data step-by-step; variational autoencoders which learn the distribution of latent variables to generate data; and generative adversarial networks which train a generator and discriminator in an adversarial game to generate high quality samples. Generative models have applications in image generation, translation between domains, and simulation.
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
Professor Steve Roberts, Machine learning research group and Oxford-Man Institute + Alan Turing Institute. Steve gave this talk on the 24th January at the London Bayes Nets meetup.
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
The document discusses a paper that argues traditional theories of generalization may not fully explain why large neural networks generalize well in practice. It summarizes the paper's key points:
1) The paper shows neural networks can easily fit random labels, calling into question traditional measures of complexity.
2) Regularization helps but is not the fundamental reason for generalization. Neural networks have sufficient capacity to memorize data.
3) Implicit biases in algorithms like SGD may better explain generalization by driving solutions toward minimum norm.
4) The paper suggests rethinking generalization as the effective capacity of neural networks may differ from theoretical measures. Understanding finite sample expressivity is important.
This paper proposes AmbientGAN, which trains a generative adversarial network using partial or noisy observations rather than fully observed samples. AmbientGAN trains the discriminator on the measurement domain rather than the raw data domain, allowing the generator to be trained without needing large amounts of good training data. The paper proves it is theoretically possible to recover the original data distribution even when the measurement process is not invertible. It presents experimental results showing AmbientGAN can generate high quality samples and recover the underlying data distribution from various types of lossy and noisy measurements.
Introduction to Interpretable Machine LearningNguyen Giang
This document discusses interpretable machine learning and explainable AI. It begins with definitions of key terms and an overview of interpretable methods. Deep learning models are often treated as "black boxes" that are difficult to interpret. Interpretability can be achieved by using inherently interpretable models like linear models or decision trees, adding attention mechanisms, or interpreting models before, during or after building them. Later sections discuss specific interpretable techniques like understanding data through examples, MMD-Critic for learning prototypes and criticisms, and visualizing convolutional neural networks to understand predictions. The document emphasizes the importance of interpretability and explains several approaches to make machine learning models more transparent to humans.
Picked-up lists of GAN variants which provided insights to the community. (GANs-Improved GANs-DCGAN-Unrolled GAN-InfoGAN-f-GAN-EBGAN-WGAN)
After short introduction to GANs, we look through the remaining difficulties of standard GANs and their temporary solutions (Improved GANs). By following the slides, we can see the other solutions which tried to resolve the problems in various ways, e.g. careful architecture selection (DCGAN), slight change in update (Unrolled GAN), additional constraint (InfoGAN), generalization of the loss function using various divergence (f-GAN), providing new framework of energy based model (EBGAN), another step of generalization of the loss function (WGAN).
Deep generative models can generate synthetic images, speech, text and other data types. There are three popular types: autoregressive models which generate data step-by-step; variational autoencoders which learn the distribution of latent variables to generate data; and generative adversarial networks which train a generator and discriminator in an adversarial game to generate high quality samples. Generative models have applications in image generation, translation between domains, and simulation.
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Recommendation system using collaborative deep learningRitesh Sawant
Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional
CF-based methods use the ratings given to items by users
as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in
many applications, causing CF-based methods to degrade
significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as
item content information may be utilized. Collaborative
topic regression (CTR) is an appealing recent method taking
this approach which tightly couples the two components that
learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be
very effective when the auxiliary information is very sparse.
To address this problem, we generalize recent advances in
deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model
called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback)
matrix. Extensive experiments on three real-world datasets
from different domains show that CDL can significantly advance the state of the art.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
Machine Learning: Generative and Discriminative Modelsbutest
The document discusses machine learning models, specifically generative and discriminative models. It provides examples of generative models like Naive Bayes classifiers and hidden Markov models. Discriminative models discussed include logistic regression and conditional random fields. The document contrasts how generative models estimate class-conditional probabilities while discriminative models directly estimate posterior probabilities. It also compares how hidden Markov models model sequential data generatively while conditional random fields model sequential data discriminatively.
Deep Learning and Reinforcement Learning summer schools summary
26th June-6th July 2017, Montreal, Quebec
Things I learned. What was your favourite lesson?
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep Learning: concepts and use cases (October 2018)Julien SIMON
An introduction to Deep Learning theory
Neurons & Neural Networks
The Training Process
Backpropagation
Optimizers
Common network architectures and use cases
Convolutional Neural Networks
Recurrent Neural Networks
Long Short Term Memory Networks
Generative Adversarial Networks
Getting started
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://mcv-m6-video.github.io/deepvideo-2019/
This lecture provides an overview how the temporal information encoded in video sequences can be exploited to learn visual features from a self-supervised perspective. Self-supervised learning is a type of unsupervised learning in which data itself provides the necessary supervision to estimate the parameters of a machine learning algorithm.
Master in Computer Vision Barcelona 2019.
http://pagines.uab.cat/mcv/
These slides summarize the main trends in deep neural networks for video encoding. Including single frame models, spatiotemporal convolutionals, long term sequence modeling with RNNs and their combinaction with optical flow.
Model-based reinforcement learning techniques were presented that use learned models to improve upon model-free deep reinforcement learning. Several papers augmented deep networks with model-based components like planners or simulators to leverage predictions and reduce sample complexity. Techniques included using model rollouts to augment state representations, learning abstract state representations to simplify value prediction, and optimizing policies on ensemble models. While model-based methods show promise in addressing deep RL limitations, challenges remain in learning accurate models and developing policies robust to model errors.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...Deep Learning JP
This document summarizes a research paper that proposes a new dataset called Physion for evaluating how well machine learning models can predict physical interactions from vision, similar to humans. The dataset contains videos of common physical phenomena. Several state-of-the-art models were evaluated on the dataset, including particle-based simulators and vision-based models. Particle-based simulators achieved performance on par with humans, while vision-based models performed poorly. The document provides background on the motivation for the dataset and describes the different models and their approaches.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...John Mathon
AI has gone through a number of mini-boom-bust periods. The current one may be short lived as well but I have reasons to think AI is finally making some sustained progress that will see its way into mainstream technology.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Recommendation system using collaborative deep learningRitesh Sawant
Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional
CF-based methods use the ratings given to items by users
as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in
many applications, causing CF-based methods to degrade
significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as
item content information may be utilized. Collaborative
topic regression (CTR) is an appealing recent method taking
this approach which tightly couples the two components that
learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be
very effective when the auxiliary information is very sparse.
To address this problem, we generalize recent advances in
deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model
called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback)
matrix. Extensive experiments on three real-world datasets
from different domains show that CDL can significantly advance the state of the art.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
Machine Learning: Generative and Discriminative Modelsbutest
The document discusses machine learning models, specifically generative and discriminative models. It provides examples of generative models like Naive Bayes classifiers and hidden Markov models. Discriminative models discussed include logistic regression and conditional random fields. The document contrasts how generative models estimate class-conditional probabilities while discriminative models directly estimate posterior probabilities. It also compares how hidden Markov models model sequential data generatively while conditional random fields model sequential data discriminatively.
Deep Learning and Reinforcement Learning summer schools summary
26th June-6th July 2017, Montreal, Quebec
Things I learned. What was your favourite lesson?
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep Learning: concepts and use cases (October 2018)Julien SIMON
An introduction to Deep Learning theory
Neurons & Neural Networks
The Training Process
Backpropagation
Optimizers
Common network architectures and use cases
Convolutional Neural Networks
Recurrent Neural Networks
Long Short Term Memory Networks
Generative Adversarial Networks
Getting started
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://mcv-m6-video.github.io/deepvideo-2019/
This lecture provides an overview how the temporal information encoded in video sequences can be exploited to learn visual features from a self-supervised perspective. Self-supervised learning is a type of unsupervised learning in which data itself provides the necessary supervision to estimate the parameters of a machine learning algorithm.
Master in Computer Vision Barcelona 2019.
http://pagines.uab.cat/mcv/
These slides summarize the main trends in deep neural networks for video encoding. Including single frame models, spatiotemporal convolutionals, long term sequence modeling with RNNs and their combinaction with optical flow.
Model-based reinforcement learning techniques were presented that use learned models to improve upon model-free deep reinforcement learning. Several papers augmented deep networks with model-based components like planners or simulators to leverage predictions and reduce sample complexity. Techniques included using model rollouts to augment state representations, learning abstract state representations to simplify value prediction, and optimizing policies on ensemble models. While model-based methods show promise in addressing deep RL limitations, challenges remain in learning accurate models and developing policies robust to model errors.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...Deep Learning JP
This document summarizes a research paper that proposes a new dataset called Physion for evaluating how well machine learning models can predict physical interactions from vision, similar to humans. The dataset contains videos of common physical phenomena. Several state-of-the-art models were evaluated on the dataset, including particle-based simulators and vision-based models. Particle-based simulators achieved performance on par with humans, while vision-based models performed poorly. The document provides background on the motivation for the dataset and describes the different models and their approaches.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
https://mcv-m6-video.github.io/deepvideo-2019/
Overview of deep learning solutions for video processing. Part of a series of slides covering topics like action recognition, action detection, object tracking, object detection, scene segmentation, language and learning from videos.
Artificial Intelligence is back, Deep Learning Networks and Quantum possibili...John Mathon
AI has gone through a number of mini-boom-bust periods. The current one may be short lived as well but I have reasons to think AI is finally making some sustained progress that will see its way into mainstream technology.
Scene Description From Images To SentencesIRJET Journal
This document presents an approach for generating sentences to describe images using distributed intelligence. It involves detecting objects in images using YOLO detection, finding relative positions of objects, labeling background scenes, generating tuples of objects/scenes/relations, extracting candidate sentences from Wikipedia containing tuple elements, searching images for each sentence and selecting the sentence whose images most closely match the input image. The approach is compared to the Babytalk model using BLEU and ROUGE scores, showing comparable performance. Future work to improve object detection and use larger knowledge sources is discussed.
The document discusses image captioning using deep neural networks. It begins by providing examples of how humans can easily describe images but generating image captions with a computer program was previously very difficult. Recent advances in deep learning, specifically using convolutional neural networks (CNNs) to recognize objects in images and recurrent neural networks (RNNs) to generate captions, have enabled automated image captioning. The document discusses CNN and RNN architectures for image captioning and provides examples of pre-trained models that can be used, such as VGG-16.
Chatbots are growing in popularity as developers face the
limitations of the mobile app. User interfaces that simulate a human
conversation, the history of chatbots goes back to the late 18th
century. I'll take you on a tour of that history with an eye on finding
insights on what is possible today and in the near future with chatbots.
Issues Covered: Amazon Alexa, Facebook Messenger Chatbots, Alan
Turing, and much more.
The document discusses generative models and their applications in artificial intelligence. Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other. The generator learns to generate new data that looks real by fooling the discriminator, while the discriminator learns to better identify real from fake data. GANs have been used for tasks like image generation and neural style transfer. They show potential to generate art, music and other creative forms through machine learning.
The document discusses generative models and their applications in artificial intelligence and creativity. Generative adversarial networks (GANs) use two neural networks, a generator and discriminator, that compete against each other to generate new data instances that resemble real data. GANs can be used to generate images, music, and other types of content. Neural style transfer uses GANs to merge the semantic content of one image with the visual style of another. Adaptive style transfer applies the techniques to entire collections of images by specific artists. Generative models show promise for creating novel artistic outputs through techniques like GANs and neural style transfer.
Deep Learning from Scratch - Building with Python from First Principles.pdfYungSang1
This document summarizes the preface of the book "Deep Learning from Scratch" by Seth Weidman.
1) Existing resources on neural networks fall short in providing a unified conceptual and implementation-based explanation. This book aims to fill that gap by explaining concepts through text, visuals, math, and code implementations.
2) Understanding neural networks requires understanding multiple mental models, including mathematical functions, computational graphs, layers and neurons, and universal function approximation. The book will show how these models connect.
3) The book outlines how it will build neural networks from first principles in Python, explain important techniques like training tricks and transfer learning, and finally show how to apply the concepts using PyTorch.
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
The document discusses using sequence-to-sequence learning models for tasks like machine translation, question answering, and image captioning. It describes how recurrent neural networks like LSTMs can be used in seq2seq models to incorporate memory. Finally, it proposes that seq2seq models can be enhanced by incorporating external memory structures like knowledge bases to enable capabilities like causal reasoning for question answering.
Talk given at the 6th Irish NLP Meetup on query understanding using conceptual slices and word embeddings.
https://www.meetup.com/NLP-Dublin/events/237998517/
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
This document discusses multimedia authoring tools and techniques. It covers 3D modeling software like 3D Studio Max and how to use texture mapping and animation. It also discusses web page authoring using Dreamweaver and how layers can represent different HTML objects. Automatic authoring of multimedia is discussed, specifically problems with moving from text-based to image-based authoring and managing nodes from legacy documents. Simple animation is demonstrated using a fish sprite moving along a path overlaid on video.
Deep learning and Watson Studio can be used for various tasks including planet discoveries, particle physics experiments at CERN, and scientific publications analysis. Convolutional neural networks are commonly used for image-related tasks like cancer diagnosis, object detection, and style transfer, while recurrent neural networks with LSTM or GRU are useful for sequential data like text for machine translation, sentiment analysis, and music generation. Hybrid and complex models combine different neural network architectures for tasks such as named entity recognition, music generation, blockchain security, and lip reading. Deep learning is now implemented using frameworks like TensorFlow and Keras on GPUs and distributed systems. Transfer learning helps accelerate development by reusing pre-trained models. Watson Studio provides a platform for developing, testing, and deploy
The document discusses artificial intelligence and defines it as the intelligence demonstrated by machines, in particular the ability to solve novel problems, act rationally, and act like humans. It covers the history of AI from its beginnings in 1943 to modern applications of machine learning and neural networks. While some problems like chess and math proofs have been solved, full human-level intelligence remains elusive and computers still cannot understand speech, plan optimally, or learn completely on their own without specific programming.
This document discusses recent advances in machine learning for natural language processing using recurrent neural networks and word embeddings. It summarizes that neural networks trained on huge datasets can now understand human language better than ever before. Word embeddings represent words as dense vectors that encode semantic meaning, allowing machines to understand relationships between words. The document then discusses using recurrent neural networks and LSTM models to generate text, and provides an example output from a character-level text generation model trained on Alice in Wonderland.
This document discusses natural language inference and summarizes the key points as follows:
1. The document describes the problem of natural language inference, which involves classifying the relationship between a premise and hypothesis sentence as entailment, contradiction, or neutral. This is an important problem in natural language processing.
2. The SNLI dataset is introduced as a collection of half a million natural language inference problems used to train and evaluate models.
3. Several approaches for solving the problem are discussed, including using word embeddings, LSTMs, CNNs, and traditional bag-of-words models. Results show LSTMs and CNNs achieve the best performance.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This chapter introduces the concepts of modeling, data structures, and simulations as they relate to problem solving using multimedia and Java. Modeling involves describing real-world entities and their behaviors computationally. Data structures are used to organize modeling information in programs. Simulations execute models over time. The book will cover using common data structures like arrays, linked lists, and trees to structure multimedia like sound and images for animations and simulations. It will also explain why Java was chosen as the programming language. The chapter concludes by providing instructions for setting up the necessary software and files to complete the examples in the book.
Similar to David Barber - Deep Nets, Bayes and the story of AI (20)
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
2. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
6. Logical Intelligence
1968 Risch’s algorithm for integration in calculus
1972 Prolog for general logical reasoning
1997 Deep Blue defeats Kasparov
7. Other forms of intelligence
But is this getting us to where we’d like to be?
Selfridge-Shannon film clip
Speech Recognition
Visual Processing
Natural Language modelling
Planning and decision in uncertain environments
Perhaps a different approach would be useful.
8. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
9. Astonishing Hypothesis: Crick
“A person’s mental activities are entirely due to the behaviour of nerve
cells and the molecules that make them up and influence them.”
12. Information Processing in Brains
Neurons
RealWorld
Layer 1 Layer 2 High−level
Concepts
Feature
Hierarchical; Modular; Binary; Parallel; Noisy
13. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
17. Connectionism
1960 Realised a perceptron can only solve simple tasks.
1970 Decline in interest.
1980 New computing power made training multilayer networks feasible.
outputinputs
Each node (or ‘neuron’) computes a function of a weighted combination of
parental nodes: hj = σ( i wijhi)
18. Neural Networks and Deep Learning
Historical Problems with Neural Nets (1990s)
NNs are difficult to train (many local optima?).
Particularly difficult to train a NN with a large number of layers (say larger
than around 10).
‘Gradient Diffusion Problem’ – difficult to assign responsibility of errors to
individual ‘neurons’.
Machine Learning (up to 2006)
A large section of the machine learning community abandoned NNs.
More principled and computationally better understood techniques (SVMs
and related convex methods) replaced them.
Bayesian AI (1990s onwards)
From mid 1990s there was a realisation that pattern recognition is not
sufficient for all AI purposes.
Uncertainty and reasoning are not naturally representable using standard
feed-forward nets.
Explosion in more ‘symbolic’ Bayesian AI.
19. Deep Learning
NNs have resurged in interest in the last few years (Hinton, Bengio, . . . )
Also called ‘deep learning’.
Sense that very complex tasks (object recognition, learning complex structure
in data) requires going beyond simple (convex) statistical techniques.
The brain uses hierarchical distributed processing and it is likely to be for a
good reason.
Many problems have a hierarchical structure: images are made of parts;
language is hierarchical, etc.
Why now?
New computing resources (GPU processing)
Availability of large amount of data means that we can train nets with many
parameters (1010
).
Recent evidence suggests local optima are not particularly problematic.
20. Autoencoder
y1 y2 y3 y4 y5
h1 h2 h3
h4 h5
y1 y2 y3 y4 y5
h6 h7 h8
The bottleneck forces the network to try to find a low dimensional
representation of the data.
Useful for unsupervised learning.
21. Autoencoder on MNIST digits (Hinton 2006 Science)
Figure : Reconstructions using H = 30 components. From the Top: Original image,
Autoencoder1, Autoencoder2, PCA
60,000 training images (28 × 28 = 784 pixels).
Use a form of autoencoder to find a lower (30) dimensional representation.
At the time, the special layerwise training procedure was considered
fundamental to the success of this approach. Now not deemed necessary,
provided we use a sensible initialisation.
22. Google Cats
10 Million Youtube video frames (200x200 pixel images).
Use a specialised autoencoder with 9 layers (1 billion weights).
2000 computers + two weeks of computing.
Examine units to see what images they most respond to.
24. Convolutional NNs
CNNs are particularly popular in image processing
Often the feature maps correspond (not to macro features such as bicycles)
but micro features.
For example, in handwritten digit recognition they correspond to small
constituent parts of the digits.
These are used then to process the image into a representation that is better
for recognition.
25. NNs in NLP
Bag of Words
We have D words in a dictionary, aardvark,. . .,zorro so that we can relate
each word with its dictionary index.
We can also think of this as a Euclidean embedding e:
aardvark → eaardvark =
1
0
...
0
, zorro → ezorro =
0
0
...
1
Word Embeddings
Idea is to replace the Euclidean embeddings e with embeddings (vectors) v
that are learned.
Objective is, for example, next word prediction accuracy.
These are often called ‘neural language models’.
26. NNs in NLP
Each word w in the dictionary has an associated embedding vector vw.
Usually around 200 dimensional vectors are used.
Consider the sentence:
the cat sat on the mat
and that we wish to predict the word on given the two preceding cat sat
and two succeeding words the mat
We can use a network that has inputs vcat, vsat, vthe, vmat
The output of the network is a probability over all words in the dictionary
p(w| {vinputs}).
We want p(w = on|vcat, vsat, vthe, vmat) to be high.
The overall objective is then to learn all the word embeddings and network
parameters subject to predicting the word correctly based on the context.
27. Word Embeddings
Given a word (France, for example) we can find which words w have embedding
vectors closest to vFrance. From Ronan Collabert (2011).
28. Word Embeddings
There appears to be a natural ‘geometry’ to the embeddings. For example, there
are directions that correspond to gender.
vwoman − vman ≈ vaunt − vuncle
vwoman − vman ≈ vqueen − vking
From Mikolov (2013).
29. Word Embeddings: Analogies
Given a relationship, France-Paris, we get the ‘relationship’ embedding
v = vParis − vFrance
Given Italy we can calculate vItaly + v and find the word in the dictionary which
has closest embedding to this (it turns out to be Rome!). From Mikolov (2013).
30. Word Embeddings: Constrained Embeddings
We can learn embeddings for English words and embeddings for Chinese
words.
However, when we know that a Chinese and English word have a similar
meaning, we add a constraint that the word embeddings vChineseWord and
vEnglishWord should be close.
We have only a small amount of labelled ‘similar’ Chinese-English words
(these are the green border boxes in the above; they are standard translations
of the corresponding Chinese character).
We can visualise in 2D (using t-SNE) the embedding vectors. See Socher
(2013)
32. Recursive Nets and Embeddings
Stanford Sentiment Treebank. Consists of parsed sentences with sentiment labels
(−−, −, 0, +, ++) for each node (phrase) in the tree. 215,000 labelled phrases
(obtained from three humans).
33. Recursive Nets and Embeddings
Idea is to recursively combine embeddings such that they accurately predict
the sentiment at each node.
34. Recursive Nets and Embeddings
Training
We have a softmax classifier for each node in the tree, to predict the
sentiment of the phrase beneath this node in the tree.
The weights of this classifier are shared across all nodes.
At the leaf nodes at the bottom of the tree, the inputs to the classifiers are
the word embeddings.
The embeddings are combined by another network g with common
parameters, which forms the input to the sentiment classifier.
We then learn all the embeddings, shared classifier parameters and shared
combination parameters to maximise the classification accuracy.
Prediction
For a new movie review, the review is first parsed using a standard grammar
tree parser.
This forms the tree which can be used to recursively form the sentiment class
label for the review.
Currently the best sentiment classifier. Socher (2013)
36. Recurrent Nets
x1 x2 x3
h1 h2 h3
y1 y2 y3
A A A
C C C
B B
RNNs are used in timeseries applications
The basic idea is that the hidden units at time ht (and possibly output yt)
depend on the previous state of the network ht−1, xt−1, yt−1 for inputs xt and
outputs yt.
In the above network, I ‘unrolled the net through time’ to give a standard NN
diagram.
I omitted the potential links from xt−1, yt−1 to ht.
38. Handwriting Generation using a RNN
Some generated examples. Top line is real handwriting, for comparison. See Alex
Grave’s work.
39. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
40. Reasons research in deep learning has exploded
Much greater compute power. (GPU)
Much larger datasets.
AutoDiff.
What is AutoDiff?
AutoDiff takes a function f(x) and returns an exact value (up to machine
accuracy) for the gradient
gi(x) ≡
∂
∂xi
f
x
Note that this is not the same as a numerical approximation (such as central
differences) for the gradient.
One can show that, if done efficiently, one can always calculate the gradient in
less than 5 times the time it takes to compute f(x).
41. Reverse Differentiation
A useful graphical representation is that the total derivative of f with respect to x
is given by the sum over all path values from x to f, where each path value is the
product of the partial derivatives of the functions on the edges:
df
dx
=
∂f
∂x
+
∂f
∂g
dg
dx
x
f
g∂f
∂x
dg
dx
∂f
∂g
Example
For f(x) = x2
+ xgh, where g =
x2
and h = xg2
x
f
gh2x + gh
2x
xh
2gx
xg
g2
f (x) = (2x + gh) + (g2
xg) + (2x2gxxg) + (2xxh) = 2x + 8x7
42. Reverse Differentiation
Consider
f(x1, x2) = cos (sin(x1x2))
We can represent this computationally using an Abstract Syntax Tree (AST):
x1 x2
f1
f2
f3
f1(x1, x2) = x1x2
f2(x) = sin(x)
f3(x) = cos(x)
Given values for x1, x2, we first run forwards through the tree so that we can
associate each node with an actual function value.
44. Reverse Differentiation
x1 x2
f1
f2
f3
∂f1
∂x1
= x2
∂f1
∂x2
= x1
∂f2
∂f1
= cos(f1)
∂f3
∂f2
= − sin(f2)
1. Find the reverse ancestral (backwards) schedule
of nodes (f3, f2, f1, x1, x2).
2. Start with the first node n1 in the reverse
schedule and define tn1 = 1.
3. For the next node n in the reverse schedule, find
the child nodes ch (n). Then define
tn =
c∈ch(n)
∂fc
∂fn
tc
4. The total derivatives of f with respect to the
root nodes of the tree (here x1 and x2) are given
by the values of t at those nodes.
This is a general procedure that can be used to automatically define a subroutine
to efficiently compute the gradient. It is efficient because information is collected
at nodes in the tree and split between parents only when required.
45. Limitations of forward reasoning
World Representation
Recognising patterns (perceptron style) is only one form of intelligence.
Solving chess problems is another and requires complex reasoning using some
form of internal model.
The world is noisy and information may be conflicting.
Recognised that new approaches are required.
46. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
47. Limitations of forward reasoning
World Representation
Models help us to fantasise about the world.
58. Stubby Fingers: errors
a b c d e f g h i j k l m n o p q r s t u v w x y z
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
59. Stubby Fingers: language
a b c d e f g h i j k l m n o p q r s t u v w x y z
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
60. Stubby Fingers
Given the typed sequence cwsykcak what is the most likely word that this
corresponds to?
List the 200 most likely hidden sequences
Discard those that are not in a standard English dictionary
Take the most likely proper English word as the intended typed word
61. Speech Recognition: raw signal
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Time
64. Medical Diagnosis
tumour flu meningitis
headache fever appetite x-ray
Combine known medical knowledge with patient specific information.
65. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
66. Probability
Why Probability?
Probability is a logical calculus of uncertainty.
Natural framework to use in models of physical systems, such as the Ising
Model (1920) and in AI applications, such as the HMM (Baum 1966,
Stratonovich 1960).
The need for structure
We often want to make a probabilistic description of many objects (electron
spins, neurons, customers, etc. ).
Typically the representational and computational cost of probabilistic models
grows exponentially with the number of objects represented.
Without introducing strong structural limitations about how these objects can
interact, probability is a non-starter.
For this reason, computationally ‘simpler’ alternatives (such as fuzzy logic)
were introduced to try to avoid some of these difficulties – however, these are
typically frowed on by purists.
67. Graphical Models
We can use graphs to represent how objects can probabilistically interact with
each other.
Graphical Models and then a marriage between Graph and Probability theory.
Many of the quantities that we would like to compute in a probability
distribution can then be related to operations on the graph.
The computational complexity of operations can often be related to the
structure of the graph.
Graphical Models are now used as a standard framework in Engineering,
Statistics and Computer Science.
Graphical Models are used to perform reasoning under uncertainty and are
therefore widely applicable.
68. Uses in Industry
Microsoft: used to estimate the skill distribution of players in online games
(the worlds largest graphical model?!).
Hospitals use Belief Nets to encode knowledge about diseases and symptoms
to aid medical diagnosis.
Google, Microsoft, Facebook: used in many places, including advertising,
video game prediction, speech recognition.
Used to estimate inherent desirability of products in consumer retail.
Microsoft and others: Attempt to go beyond simple A/B testing by uses
Graphical Models to model the whole company/user relationship.
69. Conditional Probability and Bayes’ Rule
The probability of event x conditioned on knowing event y (or more shortly, the
probability of x given y) is defined as
p(x|y) ≡
p(x, y)
p(y)
=
p(y|x)p(x)
p(y)
(Bayes’ rule)
Throwing darts
p(region 5|not region 20) =
p(region 5, not region 20)
p(not region 20)
=
p(region 5)
p(not region 20)
=
1/20
19/20
=
1
19
Interpretation
p(A = a|B = b) should not be interpreted as ‘Given the event B = b has occurred,
p(A = a|B = b) is the probability of the event A = a occurring’. The correct
interpretation should be ‘p(A = a|B = b) is the probability of A being in state a
under the constraint that B is in state b’.
70. Battleships
Assume there are 2 ships, 1 vertical (ship 1) and 1 horizontal (ship 2), of 5
pixels each.
Can be placed anywhere on the 10×10 grid, but cannot overlap.
Let s1 is the origin of ship 1 and s2 the origin of ship 2
Data D is a collection of query ‘hit’ or ‘miss’ responses.
p(s1, s2|D) =
p(D|s1, s2)p(s1, s2)
p(D)
Let X be the matrix of pixel occupancy
p(X|D) =
s1,s2
p(X, s1, s2|D) =
s1,s2
p(X|s1, s2)p(s1, s2|D)
demoBattleships.m
71. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
72. Belief Networks (Bayesian Networks)
A belief network is a directed acyclic graph in which each node has associated the
conditional probability of the node given its parents.
The joint distribution is obtained by taking the product of the conditional
probabilities:
p(A, B, C, D, E) = p(A)p(B)p(C|A, B)p(D|C)p(E|B, C)
p(E|B, C)
A B
C
D
E
73. Example – Part I
Sally’s burglar Alarm is sounding. Has she been Burgled, or was the alarm
triggered by an Earthquake? She turns the car Radio on for news of earthquakes.
Choosing an ordering
Without loss of generality, we can write
p(A, R, E, B) = p(A|R, E, B)p(R, E, B)
= p(A|R, E, B)p(R|E, B)p(E, B)
= p(A|R, E, B)p(R|E, B)p(E|B)p(B)
Assumptions:
The alarm is not directly influenced by any report on the radio,
p(A|R, E, B) = p(A|E, B)
The radio broadcast is not directly influenced by the burglar variable,
p(R|E, B) = p(R|E)
Burglaries don’t directly ‘cause’ earthquakes, p(E|B) = p(E)
Therefore
p(A, R, E, B) = p(A|E, B)p(R|E)p(E)p(B)
74. Example – Part II: Specifying the Tables
B
A
E
R
p(A|B, E)
Alarm = 1 Burglar Earthquake
0.9999 1 1
0.99 1 0
0.99 0 1
0.0001 0 0
p(R|E)
Radio = 1 Earthquake
1 1
0 0
The remaining tables are p(B = 1) = 0.01 and p(E = 1) = 0.000001. The tables
and graphical structure fully specify the distribution.
75. Example Part III: Inference
Initial Evidence: The alarm is sounding
p(B = 1|A = 1) =
E,R p(B = 1, E, A = 1, R)
B,E,R p(B, E, A = 1, R)
=
E,R p(A = 1|B = 1, E)p(B = 1)p(E)p(R|E)
B,E,R p(A = 1|B, E)p(B)p(E)p(R|E)
≈ 0.99
Additional Evidence: The radio broadcasts an earthquake warning:
A similar calculation gives p(B = 1|A = 1, R = 1) ≈ 0.01.
Initially, because the alarm sounds, Sally thinks that she’s been burgled.
However, this probability drops dramatically when she hears that there has
been an earthquake.
76. Markov Models
For timeseries data v1, . . . , vT , we need a model p(v1:T ). For causal consistency, it
is meaningful to consider the decomposition
p(v1:T ) =
T
t=1
p(vt|v1:t−1)
with the convention p(vt|v1:t−1) = p(v1) for t = 1.
v1 v2 v3 v4
Independence assumptions
It is often natural to assume that the influence of the immediate past is more
relevant than the remote past and in Markov models only a limited number of
previous observations are required to predict the future.
77. Markov Chain
Only the recent past is relevant:
p(vt|v1, . . . , vt−1) = p(vt|vt−L, . . . , vt−1)
where L ≥ 1 is the order of the Markov chain
p(v1:T ) = p(v1)p(v2|v1)p(v3|v2) . . . p(vT |vT −1)
For a stationary Markov chain the transitions p(vt = s |vt−1 = s) = f(s , s) are
time-independent (‘homogeneous’).
v1 v2 v3 v4
(a)
v1 v2 v3 v4
(b)
Figure : (a): First order Markov chain. (b): Second order Markov chain.
78. Markov Chains
v1 v2 v3 v4
p(v1, . . . , vT ) = p(v1)
initial
T
t=2
p(vt|vt−1)
Transition
State transition diagram
Nodes represent states of the variable v and arcs non-zero elements of the
transition p(vt|vt−1)
1 2
34
56
7
8 9
79. Most probable and shortest paths
1 2
34
56
7
8 9
The shortest (unweighted) path from state 1 to state 7 is 1 − 2 − 7.
The most probable path from state 1 to state 7 is 1 − 8 − 9 − 7 (assuming
uniform transition probabilities). The latter path is longer but more probable
since for the path 1 − 2 − 7, the probability of exiting state 2 into state 7 is
1/5.
80. Equilibrium distribution
It is interesting to know how the marginal p(xt) evolves through time:
p(xt = i) =
j
p(xt = i|xt−1 = j)
Mij
p(xt−1 = j)
p(xt = i) is the frequency that we visit state i at time t, given we started
from p(x1) and randomly drew samples from the transition p(xτ |xτ−1).
As we repeatedly sample a new state from the chain, the distribution at time
t, for an initial distribution p1(i) is
pt = Mt−1
p1
If, for t → ∞, p∞ is independent of the initial distribution p1, then p∞ is
called the equilibrium distribution of the chain:
p∞ = Mp∞
The equil. distribution is proportional to the eigenvector with unit eigenvalue
of the transition matrix.
81. PageRank
Define the matrix
Aij =
1 if website j has a hyperlink to website i
0 otherwise
From this we can define a Markov transition matrix with elements
Mij =
Aij
i Ai j
If we jump from website to website, the equilibrium distribution component
p∞(i) is the relative number of times we will visit website i. This has a
natural interpretation as the ‘importance’ of website i.
For each website i a list of words associated with that website is collected.
After doing this for all websites, one can make an ‘inverse’ list of which
websites contain word w. When a user searches for word w, the list of
websites that contain word is then returned, ranked according to the
importance of the site.
82. Hidden Markov Models
The HMM defines a Markov chain on hidden (or ‘latent’) variables h1:T . The
observed (or ‘visible’) variables are dependent on the hidden variables through an
emission p(vt|ht). This defines a joint distribution
p(h1:T , v1:T ) = p(v1|h1)p(h1)
T
t=2
p(vt|ht)p(ht|ht−1)
For a stationary HMM the transition p(ht|ht−1) and emission p(vt|ht) distributions
are constant through time.
v1 v2 v3 v4
h1 h2 h3 h4 Figure : A first order hidden Markov model
with ‘hidden’ variables
dom(ht) = {1, . . . , H}, t = 1 : T. The
‘visible’ variables vt can be either discrete or
continuous.
83. The classical inference problems
Filtering (Inferring the present) p(ht|v1:t)
Prediction (Inferring the future) p(ht|v1:s) t > s
Smoothing (Inferring the past) p(ht|v1:u) t < u
Likelihood p(v1:T )
Most likely path (Viterbi alignment) argmax
h1:T
p(h1:T |v1:T )
For prediction, one is also often interested in p(vt|v1:s) for t > s.
84. Inference in Hidden Markov Models
Belief network representation of a HMM:
h1 h2 h3 h4
v1 v2 v3 v4
Filtering, Smoothing and Viterbi are all computationally efficient, scaling
linearly with the length of the timeseries (but quadratically with the number
of hidden states).
The algorithms are variants of ‘message passing on factor graphs’
Algorithm guaranteed to work if the graph is singly-connected.
Huge research effort in the last 15 years to apply message passing for
approximate inference in multiply-connected graphs (eg low-density
parity-check codes).
85. HMMs for speech recognition
ht is the phoneme at time t. p(ht|ht−1) – language model. p(vt|ht) – speech
signal model.
86. Deep Nets and HMMs
h1 h2 h3 h4
v1 v2 v3 v4
Recently companies including Google have made big advances in speech
recognition.
The breakthrough is to model p(vt|ht) as a Gaussian whose mean is some
function of the phoneme µ(ht; θ).
This function is a deep neural network, trained on a large amount of data.
Goldrush at the moment to find similar breakthrough applications of deep
networks in reasoning systems.
87. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
88. Generative Model
h1 h2
v1 v2 v3 v4
It is natural to consider that objects (images for example) can be constructed
on the basis of a low dimensional representation.
Note that this is a Graphical Model, not a Function
The latent variables h can be sampled from, using p(h) and then an image
sampled from p(v|h).
One cannot use an autoencoder to generate new images.
The bad news
Inference (computing p(h|v) and parameter learning) is intractable in these
models.
Statisticians typically use sampling as an approximation.
Very popular in ML to use a variational method – much faster for inference.
89. Variational Inference
Consider a distribution
p(v|θ) =
h
p(v|h, θ)p(h)
and that we wish to learn θ to maximise the probability this model generates
observed data.
log p(v|θ) ≥ − q(h|v, φ) log q(h|v, φ) +
h
q(h|v, φ)p(v|h, θ) + const.
Idea is to choose a ‘variational’ distribution q(h|v, φ) such that we can either
calculate analytically the bound, or sample it efficiently.
We then jointly maximise the bound wrt φ and θ.
We can parameterise p(v|h, θ) using a deep network.
Very popular approach – see ‘variational autoencoder’ and also attention
mechanisms.
Extension to semi-supervised method using p(v) = h c p(v|h, c)p(c)p(h)
91. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
93. Deep Reinforcement Learning
Given a state of the world, W and a set of possible actions A, we need to
decide which action to taken for any state of W that will be best for our long
term goals.
Problem is that the number of pixel states is enormous.
Need to learn a low dimensional representation of the screen (use a deep
generative model).
Learn then which action to take given the low dimensional representation.
Tetris
Google
94. Table of Contents
History of the AI dream
How do brains work?
Connectionism
AutoDiff
Fantasy Machines
Probability
Directed Graphical Models
Variational Generative Models
Reinforcement Learning
Outlook
95. Outlook
Machine Learning is in a boom period.
Renewed interest and hope in creating AI.
Combine new computational power with suitable hierarchical representations.
Impressive state of the art results in Speech Recognition, Image Analysis,
Game Playing.
Challenges
Improve understanding of optimisation for deep learning.
Learn how to more efficiently exploit computational resources.
Learn how to exploit massive databases.
Improve interaction between reinforcement learning and representation
learning.
Marry non-symbolic (neural) with symbolic (Bayesian reasoning)
Emphasis is on scalability.
Feel free to contact me at UCL or at my AI company reinfer
https:://reinfer.io