Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

•

1 like•239 views

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning - Author: Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang - Origin: https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tree-search-planning - Related: https://github.com/number9473/nn-algorithm/issues/251

Engineering

Deep Learning for Real-Time Atari
Game Play Using Offline Monte-Carlo
Tree Search Planning
Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang. NIPS 2014.
Yu Kai Huang

Outline
● Main idea
● Monte-Carlo Tree Search
○ Selection
○ Expansion
○ Simulation
○ Backpropagation
● Experiment
○ Three methods
○ Visualization

Main Idea
“We achieve this by introducing new methods for combining RL and DL that use
slow, off-line Monte Carlo tree search planning methods to generate training
data for a deep-learned classifier capable of state-of-the-art real-time play.”

Deep Q-learning Network
Image from https://arxiv.org/pdf/1312.5602.pdf

Sampling training data
● Experience Replay
● ϵ−greedy action selection
○ Exploration & Exploitation

Sampling training data
● Off-line Monte Carlo tree search planning method
○ UCT-agent

MCTS
● The true value of any action can be approximated by running several random
simulations.
● These values can be efficiently used to adjust the policy (strategy) towards a
best-first strategy.
Image from https://www.zhihu.com/question/39916945

MCTS
● Iteratively building partial search tree
● Iteration
○ Most urgent node
■ Tree policy
■ Exploration/exploitation
○ Simulation
■ Add child node
■ Default policy
○ Update weights
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
● Upper Confidence bounds for Trees
Image from https://www.researchgate.net/publication/220978338_Monte-Carlo_Tree_Search_A_New_Framework_for_Game_AI

MCTS - UCT
Selection
● Start at root node
● Based on Tree Policy select child: UCB
● Apply recursively - descend through tree
○ Stop when expandable node is reached
○ Expandable
■ Node that is non-terminal and has unexplored children
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Expansion
● Add one or more child nodes to tree
○ Depends on what actions are available for the current position
○ Method in which this is done depends on Tree Policy
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Simulation
● Runs simulation of path that was selected
● Default Policy determines how simulation is run
● The outcome determines value
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Backpropagation
● Moves backward through saved path
● Value of Node
○ representative of benefit of going down that path from parent
● Values are updated dependent on board outcome
○ Based on how the simulated game ends, values are updated
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Image from https://zhuanlan.zhihu.com/p/30458774

Three Methods
● UCTtoRegression
○ The UCT training data is used to train the CNN via regression.
● UCTtoClassification
○ The UCT training data is used to train the CNN via classification.
● UCTtoClassification-Interleaved
○ The UCT training data is used to train the CNN via classification.
○ Then use the trained CNN to decide action choices in collecting further runs.
○ Then finetune the trained CNN.

Visualization of the first-layer features

Reference
[1] Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning,
https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tr
ee-search-planning
[2] Monte Carlo Tree Search and AlphaGo, Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar,
http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
[3] tobe: 如何学习蒙特卡罗树搜索（MCTS）, https://zhuanlan.zhihu.com/p/30458774

This paper proposes a one-shot scene-specific crowd counting technique using deep learning. Specifically, it uses a convolutional neural network called CSRNet as the backbone model. For a target scene, it fine-tunes just the decoder part of CSRNet using a single labeled image of that scene. This adapts the model to the target scene. Experimental results on standard datasets show the proposed approach outperforms baseline methods and can generalize across different datasets with same or different object types. The paper addresses a novel problem of one-shot scene-specific crowd counting using deep learning with potential applications in areas like surveillance and traffic monitoring.

Labmeeting - 20150512 - New Secure Routing Method & Applications Facing MitM ...

Syuan Wang

This document proposes a new secure routing method using graph theory to route network traffic across multiple paths to mitigate man-in-the-middle attacks. It represents computer networks as graphs and develops an algorithm called pathFinder to choose secure path combinations based on criteria like safety, speed and buffer size. The method finds two paths between a source and destination with equal weight or calculates a ratio of traffic loads across two unequal weight paths to balance security and performance. A simulation confirmed the approach does not significantly impact router performance. Further optimization is needed to scale to larger networks and select only the most secure paths.

Google Developer Groups Talk - TensorFlow

Harini Gunabalan

This document provides an introduction to machine learning and TensorFlow. It defines machine learning as a subfield of artificial intelligence that uses algorithms to iteratively learn from data without being explicitly programmed. It discusses supervised and unsupervised learning techniques. Supervised learning uses labelled training data to solve regression and classification problems, while unsupervised learning finds hidden patterns in unlabelled data. The document then introduces TensorFlow, an open-source machine learning library developed by Google for numeric computation using data flow graphs. It provides an example of using TensorFlow to build a softmax regression model for classifying images of handwritten digits from the MNIST dataset.

Centernet

Arithmer Inc.

Slide for study session given by Christian Saravia at Arithmer inc. It is a summary of recent method for object detection, centernet. Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。 Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.

Som presentation

Decky ASPANDI LATIF

This document summarizes a presentation on mathematical modeling using self-organizing maps (SOM) and its application to stock price prediction. It introduces SOM as an unsupervised neural network that can reduce dimensions and display similarities in data. The presentation describes how SOM works through competitive learning and updates node weights. It then discusses how SOM can be used for modeling, prediction, regional data analysis, and more. As an example application, it summarizes a study that used a hybrid SOM-multilayer perceptron model to more accurately predict stock prices of Lucent Inc. over five years compared to SOM or backpropagation neural networks alone. The conclusion states that techniques like SOM can enhance modeling and that collaborating

Summarizing videos with Attention

Arithmer Inc.

Slide for study session given by Christian Saravia at Arithmer inc. It is a summary of methods for video summarization using attention mechanism. Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。 Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Preferred Networks

An introduction to deep reinforcement learning

Big Data Colombia

Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés) However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks. With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!

This document provides an overview of reinforcement learning concepts and algorithms. It begins with introductions to reinforcement learning versus supervised learning, problem settings and terminology. It then covers topics like the Bellman equation, policy iteration, value iteration, model-based versus model-free approaches, on-policy versus off-policy, popular algorithms like Q-learning, DQN, policy gradients and PPO. The document also discusses trends in RL, resources for learning more, and proposes hands-on tutorials for using RL libraries to train Atari agents or implementing a custom Candy Crush environment.

Memory-based Reinforcement Learning

Hung Le

The document discusses memory-based reinforcement learning. It begins with background on reinforcement learning and classic RL algorithms like Q-learning and policy gradients. It then discusses challenges with deep RL approaches that lack memory. Different types of memory are proposed to address these challenges, including episodic memory, semantic memory, and working memory. Memory-based approaches are shown to improve sample efficiency and performance on tasks like Atari games. The role of memory is also discussed for exploration, handling partial observability, and hyperparameter optimization in reinforcement learning.

Introduction to reinforcement learning

Marsan Ma

This document introduces several reinforcement learning frameworks: Q-learning uses a table to store state-action values and learns by updating the table based on rewards. Policy gradient directly learns the policy's parameters by maximizing expected rewards. Deep Q-learning uses neural networks to generalize across large state spaces. Deep deterministic policy gradient combines value-based and policy-based methods, using actor-critic networks for continuous control tasks. Code examples demonstrate Q-learning, policy gradient, deep Q-learning and DDPG algorithms.

Building a deep learning ai.pptx

Daniel Slater

An Introduction to Neural Architecture Search

Bill Liu

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

SigOpt

This document discusses using SigOpt to tune deep learning models. It notes that tuning deep learning systems is non-intuitive and expert-intensive using traditional random search or grid search methods. SigOpt provides a more efficient approach using Bayesian optimization to suggest optimal hyperparameters after each trial, reducing wasted expert time and computation. The document provides examples applying SigOpt to tune convolutional neural networks on CIFAR10, demonstrating a 1.6% reduction in error rate over expert tuning with no wasted trials.

Web Traffic Time Series Forecasting

BillTubbs

The document summarizes a Kaggle competition to forecast web traffic for Wikipedia articles. It discusses the goal of forecasting traffic for 145,000 articles, the evaluation metric used, an overview of the winner's solution using recurrent neural networks, and lessons learned. Key points include that the winner used a sequence-to-sequence model with GRU units to capture local and global patterns in the time series data, and employed techniques like model averaging to reduce variance.

Time series analysis : Refresher and Innovations

QuantUniversity

This document provides an overview of a presentation on time series analysis using the QuSandbox platform. The presentation was given by Sri Krishnamurthy, founder and CEO of QuantUniversity, at a QuantUniversity meetup in Boston on November 29, 2018. It covered topics including machine learning techniques for time series analysis, case studies analyzing temperature and swap rate data, and a demonstration of modeling time series data with neural networks.

Ai and ml study group lecture 1 and 2

Ashley Davis

1. The document outlines an agenda for an AI and ML study group covering Stanford's CS231n course on convolutional neural networks. 2. The agenda includes reviewing key points from lectures 1 and 2, basic Python concepts, and watching lecture 2 before discussing it and reviewing detailed notes. 3. Lecture 2 covers image classification tasks, challenges, and algorithms like K-nearest neighbors and linear classifiers that are precursors to convolutional neural networks.

Machine Learning with Python

GLC Networks

This document contains the slides for a webinar on machine learning with Python presented by Achmad Mardiansyah from GLC Networks, Indonesia. The webinar agenda includes an introduction to machine learning and Python, a review of prerequisite Python knowledge, a discussion of machine learning concepts, and a live coding practice session. The slides provide background on the presenter and GLC Networks, an overview of machine learning methods, and instructions for participating in the live coding demo.

C3 w1

Ajay Taneja

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Seung Jae Lee

Automatic Image Cropping - A journey from a Master Thesis to Production

Alexey Grigorev

The document discusses developing a neural network model for automatic image cropping. It proposes that properly cropped images can improve listing performance on online classifieds sites. The model uses a Deeply Supervised Salient Network (DSS) which improves on fully convolutional networks with deep supervision and short connections. Experiments were conducted on an online classifieds site by manually evaluating cropped images in different categories. The best performing category of engagement rings was selected for initial deployment. The system architecture includes components for image enhancement, cropping, hosting, and a frontend interface.

Sequential Decision Making in Recommendations

Jaya Kawale

Thamme Gowda's Summer2016- NASA JPL Internship

Thamme Gowda

The document summarizes Thamme Gowda's 2016 summer internship at NASA's Jet Propulsion Laboratory. It discusses several projects Thamme worked on, including: 1. Classifying images of Mars landmarks using deep learning techniques like Inception-V3 and improving the efficiency of large image datasets. 2. Enhancing Apache Tika's image recognition capabilities and adding parameter support to Tika parsers. 3. Prototyping a new distributed crawling system called Sparkler and mavenizing the Apache Nutch build system. 4. Developing classifiers using SVMs and neural networks to identify human trafficking related web pages for DARPA's MEMEX program. 5

Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)

Universitat Politècnica de Catalunya

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...

PATHALAMRAJESH

This project uses logistic regression to build a cricket match win predictor. It analyzes match and ball-by-ball data to extract important features, performs exploratory data analysis to derive additional predictive features, and fits a logistic regression model to predict the winning probability of teams based on the game situation. The model achieves an accuracy of 86% on the test data. Future work includes predicting the winner based only on the first innings and adding a user interface to allow custom predictions.

A Multi-Armed Bandit Framework For Recommendations at Netflix

Jaya Kawale

Poster

Collin Purcell

The Northeastern Interactive Clustering Engine (NICE) is an open source machine learning visualization tool that allows researchers to interactively analyze data sets. It uses two machine learning algorithms, K-Means clustering and spectral clustering, to provide insights into relationships within large, multi-dimensional data. The software is engineered to accelerate these algorithms using both CPUs and GPUs for improved performance. Future work will focus on additional algorithm implementations and a recommendation system to guide users.

Machine learning ( Part 3 )

Sunil OS

This document provides an overview of unsupervised machine learning and reinforcement learning. It discusses unsupervised learning, including clustering methods like k-means. It then explains reinforcement learning concepts such as the agent, environment, actions, states, rewards, and policy. Reinforcement learning is goal-oriented learning based on interaction. Q-learning and Markov decision processes are introduced as reinforcement learning models. Applications include using the Gym library in Python to model environments like cart pole.

Human-level control through deep reinforcement learning

郁凱黃

Human-level control through deep reinforcement learning - Author: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis - Origin: https://www.nature.com/articles/nature14236 - https://github.com/number9473/nn-algorithm/issues/252

Ring loss: Convex Feature Normalization for Face Recognition

郁凱黃

Ring loss is a feature normalization approach for deep networks that augments standard loss functions like softmax. It encourages the norm of sample features to be a learned parameter R rather than enforcing hard normalization. This soft normalization helps address issues with imbalanced classification margins and disconnect between training and testing metrics due to variation in feature norms. Experiments on large face recognition datasets show Ring loss improves performance compared to softmax, especially for low resolution images where feature norms are typically lower. It achieves state-of-the-art results on benchmarks like LFW, IJB-A, MegaFace, and CFP.

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Reinforcement learning in a nutshell

Ning Zhou

Memory-based Reinforcement Learning

Hung Le

Introduction to reinforcement learning

Marsan Ma

Building a deep learning ai.pptx

Daniel Slater

An Introduction to Neural Architecture Search

Bill Liu

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

SigOpt

Web Traffic Time Series Forecasting

BillTubbs

Time series analysis : Refresher and Innovations

QuantUniversity

Ai and ml study group lecture 1 and 2

Ashley Davis

Machine Learning with Python

GLC Networks

C3 w1

Ajay Taneja

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Seung Jae Lee

Automatic Image Cropping - A journey from a Master Thesis to Production

Alexey Grigorev

Sequential Decision Making in Recommendations

Jaya Kawale

Thamme Gowda's Summer2016- NASA JPL Internship

Thamme Gowda

Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)

Universitat Politècnica de Catalunya

Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...

PATHALAMRAJESH

A Multi-Armed Bandit Framework For Recommendations at Netflix

Jaya Kawale

Poster

Collin Purcell

Machine learning ( Part 3 )

Sunil OS

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (20)

Reinforcement learning in a nutshell

Memory-based Reinforcement Learning

Introduction to reinforcement learning

Building a deep learning ai.pptx

An Introduction to Neural Architecture Search

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

Web Traffic Time Series Forecasting

Time series analysis : Refresher and Innovations

Ai and ml study group lecture 1 and 2

Machine Learning with Python

C3 w1

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Automatic Image Cropping - A journey from a Master Thesis to Production

Sequential Decision Making in Recommendations

Thamme Gowda's Summer2016- NASA JPL Internship

Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)

Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...

A Multi-Armed Bandit Framework For Recommendations at Netflix

Poster

Machine learning ( Part 3 )

More from 郁凱黃

Human-level control through deep reinforcement learning

郁凱黃

Ring loss: Convex Feature Normalization for Face Recognition

郁凱黃

Practical Block-wise Neural Network Architecture Generation

郁凱黃

Playing Atari with Deep Reinforcement Learning

郁凱黃

A Revisit of Feature Learning on CNN-based Face Recognition

郁凱黃

A Revisit of Feature Learning on CNN-based Face Recognition. Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In CVPR , 2014. K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. Y. Sun, X. Wang, X. Tang. Deep Learning Face Representation by Joint Identification-Verification. F. Schroff, D. Kalenichenko, J. Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. Y. Wen, K. Zhang, Z. Li, Y. Qiao. A Discriminative Feature Learning Approachg. W. Liu, Y. Wen, Z. Yu, M. Yang. Large-Margin Softmax Loss for Convolutional Neural Networks. W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, L. Song. SphereFace: Deep Hypersphere Embedding for Face Recognition.

Rose x Girl x White sheet

郁凱黃

Akatsuki Hackathon 2015 Demo

郁凱黃

Introduction to FreeBSD commands郁凱黃

Introduction to FreeBSD commands(beta)

郁凱黃

This document provides an introduction and outline for a presentation on FreeBSD commands. It discusses how FreeBSD is Unix-like and provides an overview of commands related to users, networking, file ownership and access permissions. As an example, it demonstrates how to use the cp command to copy a file, listing the original file, running the cp command, and then listing both the original and copied file.

電競大賽說明會ppt郁凱黃

More from 郁凱黃 (10)

Human-level control through deep reinforcement learning

Ring loss: Convex Feature Normalization for Face Recognition

Practical Block-wise Neural Network Architecture Generation

Playing Atari with Deep Reinforcement Learning

A Revisit of Feature Learning on CNN-based Face Recognition

Rose x Girl x White sheet

Akatsuki Hackathon 2015 Demo

Introduction to FreeBSD commands

Introduction to FreeBSD commands(beta)

電競大賽說明會ppt

Recently uploaded

sieving analysis and results interpretation

ssuser36d3051

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions

Victor Morales

digital fundamental by Thomas L.floydl.pdf

drwaing

A review on techniques and modelling methodologies used for checking electrom...

nooriasukmaningtyas

The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.

PPT on GRP pipes manufacturing and testing

anoopmanoharan2

[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf

awadeshbabu

Generative AI leverages algorithms to create various forms of content

Hitesh Mohapatra

6th International Conference on Machine Learning & Applications (CMLA 2024)

ClaraZara1

International Conference on NLP, Artificial Intelligence, Machine Learning an...

gerogepatton

International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.

Question paper of renewable energy sources

mahammadsalmanmech

5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...

ihlasbinance2003

Technical Drawings introduction to drawing of prisms

heavyhaig

BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf

MIGUELANGEL966976

ML Based Model for NIDS MSc Updated Presentation.v2.pptx

JamalHussainArman

spirit beverages ppt without graphics.pptx

Madan Karki

ACEP Magazine edition 4th launched on 05.06.2024

Rahul

This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.

DfMAy 2024 - key insights and contributions

gestioneergodomus

Properties Railway Sleepers and Test.pptx

MDSABBIROJJAMANPAYEL

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

insn4465

原版一模一样【微信：741003700 】【(csu毕业证书)查尔斯特大学毕业证硕士学历】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Literature Review Basics and Understanding Reference Management.pptx

Dr Ramhari Poudyal

Recently uploaded (20)

sieving analysis and results interpretation

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions

digital fundamental by Thomas L.floydl.pdf

A review on techniques and modelling methodologies used for checking electrom...

PPT on GRP pipes manufacturing and testing

[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf

Generative AI leverages algorithms to create various forms of content

6th International Conference on Machine Learning & Applications (CMLA 2024)

International Conference on NLP, Artificial Intelligence, Machine Learning an...

Question paper of renewable energy sources

5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...

Technical Drawings introduction to drawing of prisms

BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf

ML Based Model for NIDS MSc Updated Presentation.v2.pptx

spirit beverages ppt without graphics.pptx

ACEP Magazine edition 4th launched on 05.06.2024

DfMAy 2024 - key insights and contributions

Properties Railway Sleepers and Test.pptx

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

Literature Review Basics and Understanding Reference Management.pptx

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

1. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang. NIPS 2014. Yu Kai Huang

2. Outline ● Main idea ● Monte-Carlo Tree Search ○ Selection ○ Expansion ○ Simulation ○ Backpropagation ● Experiment ○ Three methods ○ Visualization

3. Main idea

4. Main Idea “We achieve this by introducing new methods for combining RL and DL that use slow, off-line Monte Carlo tree search planning methods to generate training data for a deep-learned classifier capable of state-of-the-art real-time play.”

5. Deep Q-learning Network Image from https://arxiv.org/pdf/1312.5602.pdf

6. Sampling training data ● Experience Replay ● ϵ−greedy action selection ○ Exploration & Exploitation

7. Sampling training data ● Off-line Monte Carlo tree search planning method ○ UCT-agent

8. Monte-Carlo Tree Search

9. MCTS ● The true value of any action can be approximated by running several random simulations. ● These values can be efficiently used to adjust the policy (strategy) towards a best-first strategy. Image from https://www.zhihu.com/question/39916945

10. MCTS ● Iteratively building partial search tree ● Iteration ○ Most urgent node ■ Tree policy ■ Exploration/exploitation ○ Simulation ■ Add child node ■ Default policy ○ Update weights Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

11. MCTS - UCT ● Upper Confidence bounds for Trees Image from https://www.researchgate.net/publication/220978338_Monte-Carlo_Tree_Search_A_New_Framework_for_Game_AI

12. MCTS - UCT Selection ● Start at root node ● Based on Tree Policy select child: UCB ● Apply recursively - descend through tree ○ Stop when expandable node is reached ○ Expandable ■ Node that is non-terminal and has unexplored children Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

13. MCTS - UCT Selection ● Start at root node ● Based on Tree Policy select child: UCB ● Apply recursively - descend through tree ○ Stop when expandable node is reached ○ Expandable ■ Node that is non-terminal and has unexplored children Exploitation Exploration Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

14. MCTS - UCT Expansion ● Add one or more child nodes to tree ○ Depends on what actions are available for the current position ○ Method in which this is done depends on Tree Policy Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

15. MCTS - UCT Simulation ● Runs simulation of path that was selected ● Default Policy determines how simulation is run ● The outcome determines value Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

16. MCTS - UCT Backpropagation ● Moves backward through saved path ● Value of Node ○ representative of benefit of going down that path from parent ● Values are updated dependent on board outcome ○ Based on how the simulated game ends, values are updated Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

17. MCTS - UCT Image from https://zhuanlan.zhihu.com/p/30458774

18. Experiment

19. Three Methods ● UCTtoRegression ○ The UCT training data is used to train the CNN via regression. ● UCTtoClassification ○ The UCT training data is used to train the CNN via classification. ● UCTtoClassification-Interleaved ○ The UCT training data is used to train the CNN via classification. ○ Then use the trained CNN to decide action choices in collecting further runs. ○ Then finetune the trained CNN.

20. CNN Architecture

21. Experimental Results

22. Visualization of the first-layer features

23. Reference [1] Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tr ee-search-planning [2] Monte Carlo Tree Search and AlphaGo, Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar, http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf [3] tobe: 如何学习蒙特卡罗树搜索（MCTS）, https://zhuanlan.zhihu.com/p/30458774

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Recommended

Recommended

More Related Content

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (20)

More from 郁凱黃

More from 郁凱黃 (10)

Recently uploaded

Recently uploaded (20)

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Recommended

Recommended

More Related Content

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (20)

More from 郁凱 黃

More from 郁凱 黃 (10)

Recently uploaded

Recently uploaded (20)

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

More from 郁凱黃

More from 郁凱黃 (10)