The document provides an overview of deep learning and reinforcement learning. It discusses the current state of artificial intelligence and machine learning, including how deep learning algorithms have achieved human-level performance in various tasks such as image recognition and generation. Reinforcement learning is introduced as learning through trial-and-error interactions with an environment to maximize rewards. Examples are given of reinforcement learning algorithms solving tasks like playing Atari games.
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
Describes deep learning as applied to natural language processing, computer vision, and robot actions. Also discusses what deep learning still can't do.
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
This talk covers 4 configurations of deep learning to solve different types of application needs. Also, strategies for speed up and real-time scoring are discussed.
Deep Learning with Python: Getting started and getting from ideas to insights in minutes.
PyData Seattle 2015
Alex Korbonits (@korbonits)
This presentation was given July 25, 2015 at the PyData Seattle conference hosted by PyData and NumFocus.
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
Describes deep learning as applied to natural language processing, computer vision, and robot actions. Also discusses what deep learning still can't do.
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
This talk covers 4 configurations of deep learning to solve different types of application needs. Also, strategies for speed up and real-time scoring are discussed.
Deep Learning with Python: Getting started and getting from ideas to insights in minutes.
PyData Seattle 2015
Alex Korbonits (@korbonits)
This presentation was given July 25, 2015 at the PyData Seattle conference hosted by PyData and NumFocus.
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Talk given at PYCON Stockholm 2015
Intro to Deep Learning + taking pretrained imagenet network, extracting features, and RBM on top = 97 Accuracy after 1 hour (!) of training (in top 10% of kaggle cat vs dog competition)
Deep Learning is the area of machine learning and one of the most talked about trends in business and computer science today.
In this talk, I will give a review of Deep Learning explaining what it is, what kinds of tasks it can do today, and what it probably could do in the future.
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
The Unreasonable Benefits of Deep Learningindico data
Dan Kuster led a talk at Sentiment Analysis Symposium discussing why businesses should consider adopting deep learning solutions. Key takeaways include simplicity, accuracy, flexibility, and some hacks for working with the tech.
About the Session:
Machine learning is becoming the tool of choice for analyzing text and image data. While traditional text processing solutions rely on the ability of experts to encode domain knowledge, machine learning models learn this directly from the data. Deep learning is a branch of machine learning that like the human brain quickly learns hierarchical representations of concepts, and it has been key to unlocking state-of-the-art results on a range of text and image classification tasks such as sentiment analysis and beyond.
In this session, we will show the impact of a deep learning based approach over NLP and traditional machine learning based methods for text analysis across key dimensions such as accuracy, flexibility, and the amount of required training data. Specifically, we will discuss how deep learning models are now setting the records for state-of-the-art accuracy in sentiment analysis. We will also demonstrate the flexibility of this approach by showing how the features learned by one model can be easily reused in different domains (e.g., handling additional languages, or predicting new categories) to drastically reduce the time to deployment. Finally, we will touch on the ability of this method to handle additional types of data beyond text, e.g, images, for maximum insight.
Tijmen Blankenvoort, co-founder Scyfer BV, presentation at Artificial Intelligence Meetup 15-1-2014. Introduction into Neural Networks and Deep Learning.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Slides from Portland Machine Learning meetup, April 13th.
Abstract: You've heard all the cool tech companies are using them, but what are Convolutional Neural Networks (CNNs) good for and what is convolution anyway? For that matter, what is a Neural Network? This talk will include a look at some applications of CNNs, an explanation of how CNNs work, and what the different layers in a CNN do. There's no explicit background required so if you have no idea what a neural network is that's ok.
What is "deep learning" and why is it suddenly so popular? In this talk I explore how Deep Learning provides a convenient framework for expressing learning problems and using GPUs to solve them efficiently.
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
25-min talk about Machine Learning and a little bit of Deep Learning. Starts with some basic definitions (Supervised and Unsupervised Learning). Then, neural networks basic functionality is explained, ending up in Deep Learning and Convolutional Neural Networks.
Machine Learning Meetup that happened in Porto Alegre, Brazil.
Deep Neural Networks that talk (Back)… with styleRoelof Pieters
Talk at Nuclai 2016 in Vienna
Can neural networks sing, dance, remix and rhyme? And most importantly, can they talk back? This talk will introduce Deep Neural Nets with textual and auditory understanding and some of the recent breakthroughs made in these fields. It will then show some of the exciting possibilities these technologies hold for "creative" use and explorations of human-machine interaction, where the main theorem is "augmentation, not automation".
http://events.nucl.ai/track/cognitive/#deep-neural-networks-that-talk-back-with-style
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
In this session, we’ll discuss approaches for applying convolutional neural networks to novel computer vision problems, even without having millions of images of your own. Pretrained models and generic image data sets from Google, Kaggle, universities, and other places can be leveraged and adapted to solve industry and business specific problems. We’ll discuss the approaches of transfer learning and fine tuning to help anyone get started on using deep learning to get cutting edge results on their computer vision problems.
Talk given at PYCON Stockholm 2015
Intro to Deep Learning + taking pretrained imagenet network, extracting features, and RBM on top = 97 Accuracy after 1 hour (!) of training (in top 10% of kaggle cat vs dog competition)
Deep Learning is the area of machine learning and one of the most talked about trends in business and computer science today.
In this talk, I will give a review of Deep Learning explaining what it is, what kinds of tasks it can do today, and what it probably could do in the future.
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
The Unreasonable Benefits of Deep Learningindico data
Dan Kuster led a talk at Sentiment Analysis Symposium discussing why businesses should consider adopting deep learning solutions. Key takeaways include simplicity, accuracy, flexibility, and some hacks for working with the tech.
About the Session:
Machine learning is becoming the tool of choice for analyzing text and image data. While traditional text processing solutions rely on the ability of experts to encode domain knowledge, machine learning models learn this directly from the data. Deep learning is a branch of machine learning that like the human brain quickly learns hierarchical representations of concepts, and it has been key to unlocking state-of-the-art results on a range of text and image classification tasks such as sentiment analysis and beyond.
In this session, we will show the impact of a deep learning based approach over NLP and traditional machine learning based methods for text analysis across key dimensions such as accuracy, flexibility, and the amount of required training data. Specifically, we will discuss how deep learning models are now setting the records for state-of-the-art accuracy in sentiment analysis. We will also demonstrate the flexibility of this approach by showing how the features learned by one model can be easily reused in different domains (e.g., handling additional languages, or predicting new categories) to drastically reduce the time to deployment. Finally, we will touch on the ability of this method to handle additional types of data beyond text, e.g, images, for maximum insight.
Tijmen Blankenvoort, co-founder Scyfer BV, presentation at Artificial Intelligence Meetup 15-1-2014. Introduction into Neural Networks and Deep Learning.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Slides from Portland Machine Learning meetup, April 13th.
Abstract: You've heard all the cool tech companies are using them, but what are Convolutional Neural Networks (CNNs) good for and what is convolution anyway? For that matter, what is a Neural Network? This talk will include a look at some applications of CNNs, an explanation of how CNNs work, and what the different layers in a CNN do. There's no explicit background required so if you have no idea what a neural network is that's ok.
What is "deep learning" and why is it suddenly so popular? In this talk I explore how Deep Learning provides a convenient framework for expressing learning problems and using GPUs to solve them efficiently.
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
25-min talk about Machine Learning and a little bit of Deep Learning. Starts with some basic definitions (Supervised and Unsupervised Learning). Then, neural networks basic functionality is explained, ending up in Deep Learning and Convolutional Neural Networks.
Machine Learning Meetup that happened in Porto Alegre, Brazil.
Deep Neural Networks that talk (Back)… with styleRoelof Pieters
Talk at Nuclai 2016 in Vienna
Can neural networks sing, dance, remix and rhyme? And most importantly, can they talk back? This talk will introduce Deep Neural Nets with textual and auditory understanding and some of the recent breakthroughs made in these fields. It will then show some of the exciting possibilities these technologies hold for "creative" use and explorations of human-machine interaction, where the main theorem is "augmentation, not automation".
http://events.nucl.ai/track/cognitive/#deep-neural-networks-that-talk-back-with-style
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
In this session, we’ll discuss approaches for applying convolutional neural networks to novel computer vision problems, even without having millions of images of your own. Pretrained models and generic image data sets from Google, Kaggle, universities, and other places can be leveraged and adapted to solve industry and business specific problems. We’ll discuss the approaches of transfer learning and fine tuning to help anyone get started on using deep learning to get cutting edge results on their computer vision problems.
Facial emotion detection on babies' emotional face using Deep Learning.Takrim Ul Islam Laskar
phase- 1
Face Detection.
Facial Landmark detection.
phase- 2
Neural Network Training and Testing.
validation and implementation.
phase - 1 has been completed successfully.
Artificial Neural Network Seminar - Google BrainRawan Al-Omari
it's our seminar in artificial neural network course, at F.I.T.E, AI Dept.
it's about Google Brain project, and who they using neural network in building it .
actually it's a very interesting project they work on it .
for more information about this project :
http://nyti.ms/T5E71e
Chen Sagiv, co founder and co CEO of SagivTech, gave an introduction talk to Computer Vision at She Codes branch in Google Campus TLV.
In the talk an overview was given on what is computer vision, where it is used, some basic notions and algorithms and the AI revolution.
Deep Representation: Building a Semantic Image Search EngineC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2PokOPm.
Emmanuel Ameisen gives a step by step tutorial on how to build a semantic search engine for text and images, with code included. The approaches presented extend naturally to other applications such as image and video captioning, reading text from videos, selecting optimal thumbnails and generating code from sketches of websites and more. Filmed at qconsf.com.
Emmanuel Ameisen is the Head of AI at Insight Data Science. He has years of experience going from product ideation to effective implementations. At Insight, he has led over a hundred AI projects from ideation to finished product in a variety of domains including Computer Vision, Natural Language Processing, and Speech Processing.
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
This Deep Learning Presentation will help you in understanding what is Deep learning, why do we need Deep learning, applications of Deep Learning along with a detailed explanation on Neural Networks and how these Neural Networks work. Deep learning is inspired by the integral function of the human brain specific to artificial neural networks. These networks, which represent the decision-making process of the brain, use complex algorithms that process data in a non-linear way, learning in an unsupervised manner to make choices based on the input. This Deep Learning tutorial is ideal for professionals with beginners to intermediate levels of experience. Now, let us dive deep into this topic and understand what Deep learning actually is.
Below topics are explained in this Deep Learning Presentation:
1. What is Deep Learning?
2. Why do we need Deep Learning?
3. Applications of Deep Learning
4. What is Neural Network?
5. Activation Functions
6. Working of Neural Network
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms.
There is booming demand for skilled deep learning engineers across a wide range of industries, making this deep learning course with TensorFlow training well-suited for professionals at the intermediate to advanced level of experience. We recommend this deep learning online course particularly for the following professionals:
1. Software engineers
2. Data scientists
3. Data analysts
4. Statisticians with an interest in deep learning
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
This talk will be a summary of the recent advances in deep learning research, current trends in the industry, and the opportunities that lie ahead.
We will discuss topics in research such as:
Transformers, GPT-3, BERT
Neural Architecture Search, Evolutionary Search
Distillation, self-learning
NeRF
Self-Attention
Also shifting industry trends such as:
The move to free data
Rising importance of 3D vision
Using synthetic data (Sim2Real)
Mobile vision & Federated Learning
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Deep Learning and Reinforcement Learning
1. Deep Learning & Reinforcement Learning
Renārs Liepiņš
Lead Researcher, LUMII & LETA
renars.liepins@lumii.lv
At “Riga AI, Machine Learning and Bots”, February 16, 2017
17. Andrew Ng
Features for machine learning
Image! Vision features! Detection!
Images!
Audio! Audio features! Speaker ID!
Audio!
Text!
Text! Text features!
Web search!
…!
Before Deep Learning
Andrew Ng
Features for machine learning
Image! Vision features! Detection!
Images!
Audio! Audio features! Speaker ID!
Audio!
Text!
Text! Text features!
Web search!
…!
Andrew Ng
Features for machine learning
Image! Vision features! Detection!
Images!
Audio! Audio features! Speaker ID!
Audio!
Text!
Text! Text features!
Web search!
…!
Source
18. Andrew Ng
Features for machine learning
Image! Vision features! Detection!
Images!
Audio! Audio features! Speaker ID!
Audio!
Text!
Text! Text features!
Web search!
…!
With Deep Learning
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Neurons in the brain
Output
Deep Learning: Neural network
20. A yellow bus
driving down….
Universal Learning Algorithm – Speech Recognition
Andrew NgAndrew Ng
_ q u i c k …
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
21. Universal Learning Algorithm – Translation
Dzeltens autobuss
brauc pa ceļu….
A yellow bus
driving down….
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
22. Universal Learning Algorithm – Self driving cars
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
23. Universal Learning Algorithm
A yellow bus
driving down….
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
24. Data (image)
The limitations of supervise
Universal Learning Algorithm – Image captions
A yellow bus
driving down….
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
25. Andrew NAndrew N
Chinese captions
(A baseball player
getting ready to bat.)
(A person surfing on the ocean.)
(A double-decker bus driving on a street.)
Andrew NAndrew N
Chinese captions
(A baseball player
getting ready to bat.)
(A person surfing on the ocean.)
(A double-decker bus driving on a street.)
Andrew NAndrew N
Chinese captions
(A baseball player
getting ready to bat.)
(A person surfing on the ocean.)
(A double-decker bus driving on a street.)
Andrew NAndrew N
Chinese captions
(A baseball player
getting ready to bat.)
(A person surfing on the ocean.)
(A double-decker bus driving on a street.)
Andrew NAndrew N
Chinese captions
(A baseball player
getting ready to bat.)
(A person surfing on the ocean.)
(A double-decker bus driving on a street.)
Andrew NAndrew N
Chinese captions
(A baseball player
getting ready to bat.)
(A person surfing on the ocean.)
(A double-decker bus driving on a street.)
26. Universal Learning Algorithm – X-ray reports
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
27. Universal Learning Algorithm – Photo localisation
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
PlaNet is able to determine the location of almost any image with superhuman ability.
Deep Learning in Computer Vision
Image Localization
PlaNet is able to determine the location of almost any image with superhuman ability
Source
28. Universal Learning Algorithm – Style Transfer
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
29.
30. Universal Learning Algorithm – Semantic Face Transforms
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Figure 1. (Zoom in for details.) Aging a 400x400 face with Deep Feature Interpolation, before a
showcasing the quality of our method. In this figure (and no other) a mask was applied to preserve th
image was 400x400, all source and target images used in the transformation were only 100x100.
olderinput mouth open eyes open smiling
Figure 2. (Zoom in for details.) An example Deep Feature Interpolation transformation of a test image (
categories. Each transformation was performed via linear interpolation in deep feature space composed
images. It also requires that sample images with and without
the desired attribute are otherwise similar to the target image
(e.g. in the case of Figure 1 they consist of images of other
age transformations. Works
content change models for
viewpoint changes) but do n
mouth open
Figure 1. (Zoom in for details.) Aging a 400x400 face with Deep Feature Interpo
showcasing the quality of our method. In this figure (and no other) a mask was appli
image was 400x400, all source and target images used in the transformation were only
olderinput mouth open eyes open
Figure 2. (Zoom in for details.) An example Deep Feature Interpolation transformation
categories. Each transformation was performed via linear interpolation in deep feature
images. It also requires that sample images with and without age transform
Source
31. Figure 1. (Zoom in for details.) Aging a 400x400 face with Deep Feature Interpolation, before and after the artifact removal step,
showcasing the quality of our method. In this figure (and no other) a mask was applied to preserve the background. Although the input
image was 400x400, all source and target images used in the transformation were only 100x100.
olderinput mouth open eyes open smiling facial hair spectacles
Figure 2. (Zoom in for details.) An example Deep Feature Interpolation transformation of a test image (Silvio Berlusconi, left) towards six
categories. Each transformation was performed via linear interpolation in deep feature space composed of pre-trained VGG features.
images. It also requires that sample images with and without
the desired attribute are otherwise similar to the target image
(e.g. in the case of Figure 1 they consist of images of other
caucasian males).
age transformations. Works by Reed et al. [29, 30] propose
content change models for challenging tasks (identity and
viewpoint changes) but do not demonstrate photo-realistic
results. A contemporaneous work [4] edits image content by
32. Universal Learning Algorithm – Lipreading
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
A yellow bus
driving down….
Deep Learning in Computer Vision
LipNet - Sentence-level Lipreading
Source
LipNet achieves 93.4% accuracy, outperforming experienced human lipreaders
and the previous 79.6% state-of-the-art accuracy.
Source
33. Universal Learning Algorithm – Sketch Vectorisation
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
34. Universal Learning Algorithm – Handwriting Generation
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
A yellow bus
driving down….
Source
35. Deep Learning in Computer Vision
Image Generation - Handwriting
This LSTM recurrent neural network is able to generate highly realistic
cursive handwriting in a wide variety of styles, simply by predicting one data
point at a time.
36. Universal Learning Algorithm – Image upscaling
Andrew NgAndrew Ng
Neurons in the brain
Output
Deep Learning: Neural network
Source
37. Google – Saving you bandwidth through machine learning
Source
39. Not Magic
• Simply downloading and “applying” open-source software won’t work.
• Needs to be customised to your business context and data.
• Needs lots of examples and computing power for training
Source
51. “cat”
● Loosely based on
(what little) we know
about the brain
What is Deep Learning?
“cat”
● Loosely based on
(what little) we know
about the brain
What is Deep Learning?
“cat”
● Loosely based on
(what little) we know
about the brain
What is Deep Learning?
“cat”
● Loosely based on
(what little) we know
about the brain
What is Deep Learning?
“cat”
● Loosely based on
(what little) we know
about the brain
What is Deep Learning?
“cat”
● Loosely based on
(what little) we know
about the brain
What is Deep Learning?
WHAT MAKES DEEP LEARNING DEEP?
Today’s Largest
Networks
~10 layers
1B parameters
10M images
~30 Exaflops
~30 GPU days
Human brain has tr
of parameters – on
75. AiSi
Pong Example
π( ) ->
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
all actions are around 0.7, reflec
previous experience. At time po
76. AiSi
Pong Example
π( ) ->
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
all actions are around 0.7, reflec
previous experience. At time po
77. Pong Example
States (S)
…
Actions (A) Rewards (R)
+1
-1
0
EnvironmentAgent
out of 49 Atari games
ithin Google
Goal: Maximize Accumulated Rewards
π(S) -> A
82. Episode
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
😁R3=+1R1=0 R2=0
ion of learned value functions on two
alization of the learned value function on
nd 2, thestate value is predicted to be ,17
the lowest level. Each of the peaks in
to a reward obtained by clearing a brick.
reak through to the top level of bricks and
tion of breaking out and clearing a
ue is above 23 and the agent has broken
bounce at the upper part of the bricks
visualization of the learned action-value
point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
.7, reflecting the expected value of this state based on
time point 2, the agent starts moving the paddle
value of the ‘up’ action stays high while the value of the
0.9. This reflects the fact that pressing ‘down’ would lead
ball and incurring a reward of 21. At time point 3,
pressing ‘up’ and the expected reward keeps increasing
the ball reaches the left edge of the screen and the value
at the agent is about to receive a reward of 1. Note,
he past trajectory of the ball purely for illustrative
hown during the game). With permission from Atari
0.7, reflecting the expected value of this state based on
At time point 2, the agent starts moving the paddle
he value of the ‘up’ action stays high while the value of the
20.9. This reflects the fact that pressing ‘down’ would lead
ball and incurring a reward of 21. At time point 3,
by pressing ‘up’ and the expected reward keeps increasing
n the ball reaches the left edge of the screen and the value
hat the agent is about to receive a reward of 1. Note,
the past trajectory of the ball purely for illustrative
shown during the game). With permission from Atari
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
re 2 | Visualization of learned value functions on two
d Pong. a, A visualization of the learned value function on
t time points 1 and 2, thestate value is predicted to be ,17
ing the bricks at the lowest level. Each of the peaks in
rve corresponds to a reward obtained by clearing a brick.
gent is about to break through to the top level of bricks and
,21 in anticipation of breaking out and clearing a
point 4, the value is above 23 and the agent has broken
oint, the ball will bounce at the upper part of the bricks
m by itself. b, A visualization of the learned action-value
e Pong. At time point 1, the ball is moving towards the
the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
ure 2 | Visualization of learned value functions on two
nd Pong. a, A visualization of the learned value function on
At time points 1 and 2, thestate value is predicted to be ,17
aring the bricks at the lowest level. Each of the peaks in
urve corresponds to a reward obtained by clearing a brick.
agent is about to break through to the top level of bricks and
to ,21 in anticipation of breaking out and clearing a
At point 4, the value is above 23 and the agent has broken
point, the ball will bounce at the upper part of the bricks
em by itself. b, A visualization of the learned action-value
me Pong. At time point 1, the ball is moving towards the
y the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
Game
Over
👍 👍 👍
iRi
∑ = +1
83. 😭
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
nded Data Figure 2 | Visualization of learned value functions on two
es, Breakout and Pong. a, A visualization of the learned value function on
game Breakout.At time points 1 and 2, thestate value is predicted to be ,17
the agent is clearing the bricks at the lowest level. Each of the peaks in
value function curve corresponds to a reward obtained by clearing a brick.
me point 3, the agent is about to break through to the top level of bricks and
value increases to ,21 in anticipation of breaking out and clearing a
e set of bricks. At point 4, the value is above 23 and the agent has broken
ugh. After this point, the ball will bounce at the upper part of the bricks
ring many of them by itself. b, A visualization of the learned action-value
tion on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
nded Data Figure 2 | Visualization of learned value functions on two
es, Breakout and Pong. a, A visualization of the learned value function on
ame Breakout.At time points 1 and 2, the state value is predicted to be ,17
the agent is clearing the bricks at the lowest level. Each of the peaks in
alue function curve corresponds to a reward obtained by clearing a brick.
me point 3, the agent is about to break through to the top level of bricks and
value increases to ,21 in anticipation of breaking out and clearing a
set of bricks. At point 4, the value is above 23 and the agent has broken
ugh. After this point, the ball will bounce at the upper part of the bricks
ing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
tended Data Figure 2 | Visualization of learned value functions on two
mes, Breakout and Pong. a, A visualization of the learned value function on
game Breakout.At time points 1 and 2, thestate value is predicted to be ,17
d the agent is clearing the bricks at the lowest level. Each of the peaks in
value function curve corresponds to a reward obtained by clearing a brick.
time point 3, the agent is about to break through to the top level of bricks and
value increases to ,21 in anticipation of breaking out and clearing a
ge set of bricks. At point 4, the value is above 23 and the agent has broken
ough. After this point, the ball will bounce at the upper part of the bricks
aring many of them by itself. b, A visualization of the learned action-value
ction on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
Episode
R3=-1R1=0 R2=0
Game
Over
👎 👎 👎
iRi
∑ = −1
88. π( ) ->
Action Probability
0 0.25 0.5 0.75 1
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
all actions are around 0.7, reflecting the expec
previous experience. At time point 2, the agen
towards the ball and the value of the ‘up’ action
‘down’ action falls to 20.9. This reflects the fac
to the agent losing the ball and incurring a re
89. π( ) ->
Action Probability
0 0.25 0.5 0.75 1
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
all actions are around 0.7, reflecting the expec
previous experience. At time point 2, the agen
towards the ball and the value of the ‘up’ action
‘down’ action falls to 20.9. This reflects the fac
to the agent losing the ball and incurring a re
90. 2. Approximate π with NeuralNet:
π(S, θ) -> P(A)
How to Find π(S) -> A ?
1. Change π to Stochastic:
π(S) -> P(A)
98. How to Find ?π(Si, θ) -> P(A)
How to Find θ
Loss Function…
99. Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
0.0
0.2
0.4
0.6
0.8
1.0
R1=0
0.0
0.2
0.4
0.6
0.8
1.0
R2=0
0.0
0.2
0.4
0.6
0.8
1.0
Game
Over
R3=+1
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Ri
i
n
∑ = +1
😁
👍 👍 👍
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
101. Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
0.0
0.2
0.4
0.6
0.8
1.0
R1=0
0.0
0.2
0.4
0.6
0.8
1.0
R2=0
0.0
0.2
0.4
0.6
0.8
1.0
Game
Over
R3=+1
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Ri
i
n
∑ = +1
😁
👍 👍 👍
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
102.
103. Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
104. Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
the game Breakout.At time points 1 and 2, the state value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
function on the game Pong. At time point 1, the ball is moving towards the
paddle controlled by the agent on the right side of the screen and the values of
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
Interactive, Inc.
LETTER RESEARCH
Extended Data Figure 2 | Visualization of learned value functions on two
games, Breakout and Pong. a, A visualization of the learned value function on
thegame Breakout.At time points 1 and 2, thestate value is predicted to be ,17
and the agent is clearing the bricks at the lowest level. Each of the peaks in
the value function curve corresponds to a reward obtained by clearing a brick.
At time point 3, the agent is about to break through to the top level of bricks and
the value increases to ,21 in anticipation of breaking out and clearing a
large set of bricks. At point 4, the value is above 23 and the agent has broken
through. After this point, the ball will bounce at the upper part of the bricks
clearing many of them by itself. b, A visualization of the learned action-value
all actions are around 0.7, reflecting the expected value of this state based on
previous experience. At time point 2, the agent starts moving the paddle
towards the ball and the value of the ‘up’ action stays high while the value of the
‘down’ action falls to 20.9. This reflects the fact that pressing ‘down’ would lead
to the agent losing the ball and incurring a reward of 21. At time point 3,
the agent hits the ball by pressing ‘up’ and the expected reward keeps increasing
until time point 4, when the ball reaches the left edge of the screen and the value
of all actions reflects that the agent is about to receive a reward of 1. Note,
the dashed line shows the past trajectory of the ball purely for illustrative
purposes (that is, not shown during the game). With permission from Atari
LETTER RESEARCH