The Predictron: End-to-end Learning and Planning

•

2 likes•327 views

This document summarizes and compares recent research on hierarchical reinforcement learning and model-based reinforcement learning using neural networks. It discusses work on reward augmentation, hierarchical actions, model-based video prediction, Value Iteration Networks, and the Predictron. Value Iteration Networks allow planning using value iteration in a neural network by treating the Bellman operator as a convolutional layer. The Predictron enables end-to-end learning and simulation of Markov reward processes without requiring interpretability of the state space. Future research directions include theoretical bounds for optimal policies on abstracted MDPs and learning smaller MDPs with hierarchical actions.

Presentations & Public Speaking

The Predictron: End-to-end Learning and
Planning
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
December 27, 2016

Reward augmentation1
1
Bellemare et al. Unifying Count-Based Exploration and Intrinsic
Motivation, NIPS 2016

Hierarchical actions2
2
Florensa et al. Stochastic Neural Networks for Hierarchical Reinforcement
Learning, under review for ICLR 2017

Model-based video prediction3
3
Oh et al. Action-Conditional Video Prediction using Deep Networks in
Atari Games, NIPS 2015

Value Iteration Networks
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel
Best paper at NIPS 2016

Value Iteration Networks
Network diagram

Value Iteration Networks
Bellman Operator as a CNN forward pass
Bellman Operator
Every V converges to V ∗ under the Bellman Operator T deﬁned
as:
(TV )(s) = max
a∈A
{R(s, a) + γ
s ∈S
P(s |s, a)V (s )} (1)
V ∗
= lim
n→∞
Tn
V ∀V (2)
The Bellman Operator can be viewed as a CNN forward pass:
(T∗
V )(s) = max
a∈A
{R(s, a) + γconvP(a)(V (s))}

Value Iteration Networks
Summary
Neural network architecture that plans using value iteration
Assumes that the state is a suﬃcient statistic for the reward
function
The small MDP must have a ﬁnite state space
Uses prior knowledge about the environment’s structure

The Predictron: End-to-End Learning and Planning
David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul,
Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert,
Neil Rabinowitz, Andre Barreto, Thomas Degris
Under review for ICLR 2017

Predictron
Summary
End-to-end simulation of an MRP
Works with arbitrary state space for small MRP
Small MRP has a non-interpretable state space
Designed for RL, but does not take actions into account

Summary
Hierarchical Reinforcement Learning attempts to identify
crucial decisions
Agents can now use NN-based planning for better decision
making
Research directions
Theoretical bounds for optimal policy of a smaller MDP
Learning a smaller MDP with abstract actions
End-to-end planning based Q network or policy network

Thank You
Value Iteration Networks
The Predictron: End-to-End Learning and Planning

The field of machine learning has seen the development of thousands of learning algorithms. Typically, scientists choose from these algorithms to solve specific problems. Their choices often being limited by their familiarity with these algorithms. In this classical/traditional framework of machine learning, scientists are constrained to making some assumptions so as to use an existing algorithm. This is in contrast to the model-based machine learning approach which seeks to create a bespoke solution tailored to each new problem.

Icml2018 naver review

NAVER Engineering

The ability to mine and extract useful information automatically, from large datasets, is a common concern for organizations (having large datasets), over the last few decades. Over the internet, data is vastly increasing gradually and consequently the capacity to collect and store very large data is significantly increasing. Existing clustering algorithms are not always efficient and accurate in solving clustering problems for large datasets. However, the development of accurate and fast data classification algorithms for very large scale datasets is still a challenge. In this paper, various algorithms and techniques especially, approach using non-smooth optimization formulation of the clustering problem, are proposed for solving the minimum sum-of-squares clustering problems in very large datasets. This research also develops accurate and real time L2-DC algorithm based with the incremental approach to solve the minimum

Efficient projectionsTomasz Waszczyk

Continual Learning with Deep Architectures - Tutorial ICML 2021

Vincenzo Lomonaco

Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills (Parisi, 2019). However, despite early speculations and few pioneering works (Ring, 1998; Thrun, 1998; Carlson, 2010), very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for (Goodfellow, 2013). Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus. In this tutorial, we propose to summarize the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI (Lomonaco, 2019). Starting from a motivation and a brief history, we link recent Continual Learning advances to previous research endeavours on related topics and we summarize the state-of-the-art in terms of major approaches, benchmarks and key results. In the second part of the tutorial we plan to cover more exploratory studies about Continual Learning with low supervised signals and the relationships with other paradigms such as Unsupervised, Semi-Supervised and Reinforcement Learning. We will also highlight the impact of recent Neuroscience discoveries in the design of original continual learning algorithms as well as their deployment in real-world applications. Finally, we will underline the notion of continual learning as a key technological enabler for Sustainable Machine Learning and its societal impact, as well as recap interesting research questions and directions worth addressing in the future. Authors: Vincenzo Lomonaco, Irina Rish Official Website: https://sites.google.com/view/cltutorial-icml2021

Deep learning with tensorflow

Charmi Chokshi

Linear Regression

Eng Teong Cheah

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...

Dongmin Choi

Multi-Chart Generative Surface Modeling

HeliBenHamu

MultiObjective(11) - CopyAMIT KUMAR

Noura2

Dr-mahmoud Algamel

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

MLconf

Deep Reinforcement Learning with Shallow Trees: In this talk, I present Concept Network Reinforcement Learning (CNRL), developed at Bonsai. It is an industrially applicable approach to solving complex tasks using reinforcement learning, which facilitates problem decomposition, allows component reuse, and simplifies reward functions. Inspired by Sutton’s options framework, we introduce the notion of “Concept Networks” which are tree-like structures in which leaves are “sub-concepts” (sub-tasks), representing policies on a subset of state space. The parent (non-leaf) nodes are “Selectors”, containing policies on which sub-concept to choose from the child nodes, at each time during an episode. There will be a high-level overview on the reinforcement learning fundamentals at the beginning of the talk. Bio: Matineh Shaker is an Artificial Intelligence Scientist at Bonsai in Berkeley, CA, where she builds machine learning, reinforcement learning, and deep learning tools and algorithms for general purpose intelligent systems. She was previously a Machine Learning Researcher at Geometric Intelligence, Data Science Fellow at Insight Data Science, Predoctoral Fellow at Harvard Medical School. She received her PhD from Northeastern University with a dissertation in geometry-inspired manifold learning.

Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...

Sujit Pal

final seminarAMIT KUMAR

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

MLconf

Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines. Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning. Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting. Abstract Summary: Harnessing Neural Networks: Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times. In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.

EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"

IJDKP

Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15

MLconf

Attention Neural Net Model Fundamentals: Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research. This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.

Context-aware preference modeling with factorization

Balázs Hidasi

This talk was presented at the Doctoral Symposium of RecSys'15. It is a summary of the core part of my PhD research in the last few years. The research revolves around solving the implicit feedback based context-aware recommendation problem with factorization. Associated paper: http://dl.acm.org/citation.cfm?id=2796543 Details of presented algorithms/methods (public versions available on http://hidasi.eu): iTALS: http://link.springer.com/chapter/10.1007/978-3-642-33486-3_5 iTALSx: http://www.infocommunications.hu/documents/169298/1025723/InfocomJ_2014_4_5_Hidasi.pdf ALS-CG/CD: http://link.springer.com/article/10.1007/s10115-015-0863-2 GFF: http://link.springer.com/article/10.1007/s10618-015-0417-y

A h k clustering algorithm for high dimensional data using ensemble learning

ijitcs

Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we apply it to high dimensional data it causes the dimensional disaster problem due to high computational complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms improve the performance for clustering high dimension dataset from different aspects in different extent. Still these algorithms will improve the performance form a single perspective. The objective of the proposed model is to improve the performance of traditional H-K clustering and overcome the limitations such as high computational complexity and poor accuracy for high dimensional data by combining the three different approaches of clustering algorithm as subspace clustering algorithm and ensemble clustering algorithm with H-K clustering algorithm.

Super Resolution with OCR Optimization

niveditJain

Model-Based Reinforcement Learning @NIPS2017

mooopan

Distributed deep learning_over_spark_20_nov_2014_ver_2.8

Vijay Srinivas Agneeswaran, Ph.D

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...

dbpublications

Region based proposals regularly depend on the features which are economical prudent derivation schemes. The proposed network includesa Region Proposal Network (RPN) which accepts a picture of any size as input and yields an arrangement of rectangular object recommendations, which includes an objectness score. The RPN is prepared end-to-end to produce great quality object recommendations, which are then utilized by Faster R-CNN for object recognition. Further the trained RPN is additionally converged with Faster R-CNN into a solitary system by sharing their convolutional highlights utilizing the as of late famous wording of neural systems with "attention" techniques and the RPN segment advises the brought together system where to look for the object in input. This strategy empowers a unified, profound learning region based proposals for object detection system. The scholarly RPN additionally enhances area proposition quality and accordingly increases the accuracy in object recognition.

What's hot

Transfer learning-presentation

Bushra Jbawi

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

csandit

Efficient projectionsTomasz Waszczyk

Continual Learning with Deep Architectures - Tutorial ICML 2021

Vincenzo Lomonaco

Deep learning with tensorflow

Charmi Chokshi

Linear Regression

Eng Teong Cheah

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...

Dongmin Choi

Multi-Chart Generative Surface Modeling

HeliBenHamu

MultiObjective(11) - CopyAMIT KUMAR

Noura2

Dr-mahmoud Algamel

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

MLconf

Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...

Sujit Pal

final seminarAMIT KUMAR

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

MLconf

EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"

IJDKP

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15

MLconf

Context-aware preference modeling with factorization

Balázs Hidasi

A h k clustering algorithm for high dimensional data using ensemble learning

ijitcs

Super Resolution with OCR Optimization

niveditJain

What's hot (19)

Transfer learning-presentation

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

Efficient projections

Continual Learning with Deep Architectures - Tutorial ICML 2021

Deep learning with tensorflow

Linear Regression

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...

Multi-Chart Generative Surface Modeling

MultiObjective(11) - Copy

Noura2

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...

final seminar

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15

Context-aware preference modeling with factorization

A h k clustering algorithm for high dimensional data using ensemble learning

Super Resolution with OCR Optimization

Similar to The Predictron: End-to-end Learning and Planning

Model-Based Reinforcement Learning @NIPS2017

mooopan

Distributed deep learning_over_spark_20_nov_2014_ver_2.8

Vijay Srinivas Agneeswaran, Ph.D

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...

dbpublications

ViT (Vision Transformer) Review [CDM]

Dongmin Choi

Graph Inductive Biases in Transformers without Message Passing.pptx

ssuser2624f71

A reinforcement learning based routing protocol with qo s support for biomedi...

Iffat Anjum

Pruning convolutional neural networks for resource efficient inference

Kaushalya Madhawa

Distributed Deep Learning + others for Spark Meetup

Vijay Srinivas Agneeswaran, Ph.D

Reservoir computing fast deep learning for sequences

Claudio Gallicchio

Recurrent Neural Networks (RNNs) represent the reference class of Deep Learning models for learning from sequential data. Despite the widespread success, a major downside of RNNs and commonly derived ‘gating’ variants (LSTM, GRU) is given by the high cost of the involved training algorithms. In this context, an increasingly popular alternative is the Reservoir Computing (RC) approach, which enables limiting the training algorithm to operate only on a restricted set of (output) parameters. RC is appealing for several reasons, including the amenability of being implemented in low-powerful edge devices, enabling adaptation and personalization in IoT and cyber-physical systems applications. This webinar will introduce Reservoir Computing from scratch, covering all the fundamental design topics as well as good practices. It is targeted to both researchers and practitioners that are interested in setting up fastly-trained Deep Learning models for sequential data.

Dp2 ppt by_bikramjit_chowdhury_final

Bikramjit Chowdhury

Implementing a neural network potential for exascale molecular dynamics

PFHub PFHub

Hao hsiang ma resume

Eliot Ma

A Hybrid Deep Neural Network Model For Time Series Forecasting

Martha Brown

Text prediction based on Recurrent Neural Network Language Model

ANIRUDHMALODE2

最近の研究情勢についていくために - Deep Learningを中心に -

Hiroshi Fukui

A study on rough set theory based

ijaia

In the present day huge amount of data is generated in every minute and transferred frequently. Although the data is sometimes static but most commonly it is dynamic and transactional. New data that is being generated is getting constantly added to the old/existing data. To discover the knowledge from this incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary. The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The paper proposes a dimension reduction algorithm that can be applied in dynamic environment for generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the reduct and build up the corresponding classification system. The method analyzes the new dataset, when it becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data set, interesting optimal classification rule sets are generated. The concepts of discernibility relation, attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of dynamic reduct set, and optimal classification rules are selected using PSO method, which not only reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed method has been applied on some benchmark dataset collected from the UCI repository and dynamic reduct is computed, and from the reduct optimal classification rules are also generated. Experimental result shows the efficiency of the proposed method.

Tensor Networks and Their Applications on Machine Learning

Kwan-yuet Ho

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...

MLconf

Graph Representation Learning with Deep Embedding Approach: Graphs are commonly used data structure for representing the real-world relationships, e.g., molecular structure, knowledge graphs, social and communication networks. The effective encoding of graphical information is essential to the success of such applications. In this talk I’ll first describe a general deep learning framework, namely structure2vec, for end to end graph feature representation learning. Then I’ll present the direct application of this model on graph problems on different scales, including community detection and molecule graph classification/regression. We then extend the embedding idea to temporal evolving user-product interaction graph for recommendation. Finally I’ll present our latest work on leveraging the reinforcement learning technique for graph combinatorial optimization, including vertex cover problem for social influence maximization and traveling salesman problem for scheduling management.

Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...

Impetus Technologies

Presentation on 'Deep Learning: Evolution of ML from Statistical to Brain-like Computing' Speaker- Dr. Vijay Srinivas Agneeswaran,Director, Big Data Labs, Impetus The main objective of the presentation is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives can be summarized as below: - First-hand experience and insights into implementation of distributed deep learning networks. - Thorough view of GraphLab (including descriptions of code) and the extensions required to implement these networks. - Details of how the extensions were realized/implemented in GraphLab source – they have been submitted to the community for evaluation. - Arrhythmia detection use case as an application of the large scale distributed deep learning network.

Complex system

Munnangi Anirudh

Similar to The Predictron: End-to-end Learning and Planning (20)

Model-Based Reinforcement Learning @NIPS2017

Distributed deep learning_over_spark_20_nov_2014_ver_2.8

Recognition and Detection of Real-Time Objects Using Unified Network of Faste...

ViT (Vision Transformer) Review [CDM]

Graph Inductive Biases in Transformers without Message Passing.pptx

A reinforcement learning based routing protocol with qo s support for biomedi...

Pruning convolutional neural networks for resource efficient inference

Distributed Deep Learning + others for Spark Meetup

Reservoir computing fast deep learning for sequences

Dp2 ppt by_bikramjit_chowdhury_final

Implementing a neural network potential for exascale molecular dynamics

Hao hsiang ma resume

A Hybrid Deep Neural Network Model For Time Series Forecasting

Text prediction based on Recurrent Neural Network Language Model

最近の研究情勢についていくために - Deep Learningを中心に -

A study on rough set theory based

Tensor Networks and Their Applications on Machine Learning

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...

Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...

Complex system

Recently uploaded

Announcement of 18th IEEE International Conference on Software Testing, Verif...

Sebastiano Panichella

Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf

khadija278284

International Workshop on Artificial Intelligence in Software Testing

Sebastiano Panichella

Eureka, I found it! - Special Libraries Association 2021 Presentation

Access Innovations, Inc.

Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.

Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...

Orkestra

Acorn Recovery: Restore IT infra within minutes

IP ServerOne

Obesity causes and management and associated medical conditions

Faculty of Medicine And Health Sciences

0x01 - Newton's Third Law: Static vs. Dynamic Abusers

OWASP Beja

f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures. About the Speaker =============== Diogo Sousa, Engineering Manager @ Canonical An opinionated individual with an interest in cryptography and its intersection with secure software development.

Getting started with Amazon Bedrock Studio and Control Tower

Vladimir Samoylov

Gregory Harris' Civics Presentation.pptx

gharris9

Bitcoin Lightning wallet and tic-tac-toe game XOXO

Matjaž Lipuš

somanykidsbutsofewfathers-140705000023-phpapp02.pptx

Howard Spence

María Carolina Martínez - eCommerce Day Colombia 2024

eCommerce Institute

Doctoral Symposium at the 17th IEEE International Conference on Software Test...

Sebastiano Panichella

Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf

Access Innovations, Inc.

Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...

OECD Directorate for Financial and Enterprise Affairs

Media as a Mind Controlling Strategy In Old and Modern Era

faizulhassanfaiz1670

This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.

Recently uploaded (17)

Announcement of 18th IEEE International Conference on Software Testing, Verif...

Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf

International Workshop on Artificial Intelligence in Software Testing

Eureka, I found it! - Special Libraries Association 2021 Presentation

Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...

Acorn Recovery: Restore IT infra within minutes

Obesity causes and management and associated medical conditions

0x01 - Newton's Third Law: Static vs. Dynamic Abusers

Getting started with Amazon Bedrock Studio and Control Tower

Gregory Harris' Civics Presentation.pptx

Bitcoin Lightning wallet and tic-tac-toe game XOXO

somanykidsbutsofewfathers-140705000023-phpapp02.pptx

María Carolina Martínez - eCommerce Day Colombia 2024

Doctoral Symposium at the 17th IEEE International Conference on Software Test...

Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf

Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...

Media as a Mind Controlling Strategy In Old and Modern Era

The Predictron: End-to-end Learning and Planning

1. The Predictron: End-to-end Learning and Planning Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology December 27, 2016

2. Hierarchical RL motivation

3. Hierarchical RL

4. Hierarchical RL

5. Reward augmentation1 1 Bellemare et al. Unifying Count-Based Exploration and Intrinsic Motivation, NIPS 2016

6. Hierarchical RL

7. Hierarchical actions2 2 Florensa et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning, under review for ICLR 2017

8. Hierarchical RL

9. Model-based video prediction3 3 Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games, NIPS 2015

10. Value Iteration Networks Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel Best paper at NIPS 2016

11. Value Iteration Networks Motivation

12. Value Iteration Networks Network diagram

13. Value Iteration Networks Bellman Operator as a CNN forward pass Bellman Operator Every V converges to V ∗ under the Bellman Operator T deﬁned as: (TV )(s) = max a∈A {R(s, a) + γ s ∈S P(s |s, a)V (s )} (1) V ∗ = lim n→∞ Tn V ∀V (2) The Bellman Operator can be viewed as a CNN forward pass: (T∗ V )(s) = max a∈A {R(s, a) + γconvP(a)(V (s))}

14. Value Iteration Networks VI Module

15. Value Iteration Networks Results

16. Value Iteration Networks Summary Neural network architecture that plans using value iteration Assumes that the state is a suﬃcient statistic for the reward function The small MDP must have a ﬁnite state space Uses prior knowledge about the environment’s structure

17. The Predictron: End-to-End Learning and Planning David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris Under review for ICLR 2017

18. Predictron

19. Predictron

20. Predictron

21. Predictron

22. Predictron

23. Predictron

24. Predictron

25. Predictron Summary End-to-end simulation of an MRP Works with arbitrary state space for small MRP Small MRP has a non-interpretable state space Designed for RL, but does not take actions into account

26. Summary Hierarchical Reinforcement Learning attempts to identify crucial decisions Agents can now use NN-based planning for better decision making Research directions Theoretical bounds for optimal policy of a smaller MDP Learning a smaller MDP with abstract actions End-to-end planning based Q network or policy network

27. Thank You Value Iteration Networks The Predictron: End-to-End Learning and Planning

The Predictron: End-to-end Learning and Planning

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to The Predictron: End-to-end Learning and Planning

Similar to The Predictron: End-to-end Learning and Planning (20)

More from Yoonho Lee

More from Yoonho Lee (8)

Recently uploaded

Recently uploaded (17)

The Predictron: End-to-end Learning and Planning