SlideShare a Scribd company logo
1 of 48
Download to read offline
余方國 博士
06/04/2023
從 Atari/AlphaGo/ChatGPT 談
深度強化學習 及 通用人工智慧
2
深度強化學習 及 通用人工智慧
Artificial General Intelligence (AGI) :
an agent can achieve or exceed human performance
in a wide range of environments
(Credit: Shane Legg and Marcus Hutter)
Reinforcement Learning : decision-making framework
Deep Learning : representation computation/optimization mechanism
Deep Reinforcement Learning : formulate problem/solution
(Credit: David Silver and Demis Hassabis)
3
深度強化學習 及 通用人工智慧
1
3
2
Atari Games
AlphaGo Series
ChatGPT / GPT-4
4
Atari Games
Pong Breakout Phoenix
https://www.gymlibrary.dev/ & https://gymnasium.farama.org/
5
Reinforcement Learning Framework
ENVIRONMENT
AGENT
State Action Reward
(s1 → a1 → r1)→ (s2 → a2 → r2)→ (s3 → a3 → r3)→ …
Making Sequential Decisions to Maximize Long-Term Rewards
6
Atari Breakout in OpenAI Gym
import gym
env = gym.make("ALE/Breakout-v5", render_mode="human")
state, info = env.reset()
for index in range(1000):
action = env.action_space.sample() # action by random or policy
state, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
state, info = env.reset()
env.close()
https://www.gymlibrary.dev/ & https://gymnasium.farama.org/
7
State/Action/Reward in Atari Breakout
State:
●
(210, 160, 3) - image
Action:
●
0 - NO OP
●
1 - FIRE
●
2 - RIGHT
●
3 - LEFT
Reward:
●
Red - 7 points
●
Orange - 7 points
●
Yellow - 4 points
●
Green - 4 points
●
Aqua - 1 point
●
Blue - 1 point
https://www.gymlibrary.dev/ & https://gymnasium.farama.org/
8
From One Game to All The Games in Atari
https://www.gymlibrary.dev/ https://gymnasium.farama.org/
9
A Journey to Artificial General Intelligence
https://www.assemblyai.com/blog/reinforcement-learning-with-deep-q-learning-explained/
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
DQN/2015
R2D2/2019
NGU/2019
Agent57/2020
10
OpenAI Gym Taxi-v3 : State/Action/Reward
State:
●
Number of Variable : 1
●
Range of Variable : [1, 500]
●
25 taxi positions x 5 passenger positions x 4 destination locations
Action:
●
0 : move south
●
1 : move north
●
2 : move east
●
3 : move west
●
4 : pickup passenger
●
5 : drop off passenger
Reward:
●
+20 : delivering passenger
●
-10 : pickup/dropoff illegally
●
-1 : per step unless other rewards is triggered
https://www.gymlibrary.dev/environments/toy_text/taxi/
11
OpenAI Gym Taxi-v3 : Q Table
(500 x 6)
https://www.gocoder.one/blog/rl-tutorial-with-openai-gym
12
Q Learning (with epsilon greedy policy)
3. exploitation
1. initialize Q table
4. exploration
5. action
2. state
8. update Q table
6. next state
7. reward
https://www.cs.toronto.edu/~rgrosse/courses/csc311_f21/
13
Limitation of Q Table
representation
scalability
14
Deep Q Network (DQN) Architecture (1/2)
Ref : Human-level control through deep reinforcement learning
15
Deep Q Network (DQN) Architecture (2/2)
Ref : Massively Parallel Methods for Deep Reinforcement Learning
16
Deep Q Learning (with experience replay and dual networks)
1. initialize replay memory
5. store transition in replay memory
6. get batch from replay memory
2. initialize main network
3. initialize target network
4. epsilon greedy policy from main network
7. calculate error between two networks
8. synchronize two networks
Ref : Human-level control through deep reinforcement learning
17
Deep Q Network (DQN) Benchmark
Ref : Human-level control through deep reinforcement learning
18
Four Tough Games in Atari
Pitfall Solaris Skiing Montezuma’s Revenge
Problems : long-term credit assignment and exploitation/exploration tradeoff
Solutions : intrinsic motivation, meta-controller, short-term/episodic memory, distributed agents, etc.
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
19
Policy Gradient on Atari Pong
https://www.youtube.com/watch?v=tqrcjHuNdmQ
20
Reinforcement Learning Algorithms
Ref: OpenAI Spinning Up
21
深度強化學習 及 通用人工智慧
1
3
2
Atari Games
AlphaGo Series
ChatGPT / GPT-4
22
A Journey to Artificial General Intelligence
https://www.deepmind.com/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules
https://www.youtube.com/watch?v=lVMgxtm5L-U
23
AlphaGo, AlphaGo Zero, Alpha Zero, MuZero
AlphaGo Zero, Nature, 2017
AlphaZero, Science, 2018 MuZero, Nature, 2020
AlphaGo, Nature, 2016
24
AlphaGo Fan/Lee/Master
●
European Go Champion Fan Hui — 5:0
●
South Korean professional Go player Lee Sedol — 4:1
●
Online games with players from China/Korea/Japan — 60:0
●
Chinese professional Go player Ke Jie — 3:0
https://www.youtube.com/watch?v=lVMgxtm5L-U
https://www.youtube.com/watch?v=WXuK6gekU1Y
25
AlphaGo Inputs and Policy/Value Networks
https://www.slideshare.net/ckmarkohchang/alphago-in-depth
26
AlphaGo Monte Carlo Tree Search
https://www.slideshare.net/ckmarkohchang/alphago-in-depth
27
AlphaZero Training Process
Self-Play
Train
Value
Network
Train
Policy
Network
https://www.youtube.com/watch?v=lVMgxtm5L-U
28
AlphaZero Network
Ref: Acquisition of Chess Knowledge in AlphaZero
AlphaGo
• Two networks: policy network and value network
• Conv/ReLu-based layer structure
AlphaZero
• One network with two heads: policy and value
• ResNet-based layer structure
29
AlphaGo Zero Performance Benchmark
https://thirdeyedata.ai/how-to-build-your-own-alphazero-ai-using-python-and-keras/
30
MuZero Training Process
h: representation
f: prediction
g: dynamics
Ref: Mastering Atari, Go, chess and shogi by planning with a learned model
31
MuZero Performance Benchmark
Ref: Mastering Atari, Go, chess and shogi by planning with a learned model
32
AlphaGo to AlphaStar by David Silver
Deep Reinforcement Learning from AlphaGo to AlphaStar - London Machine Learning Meetup
33
深度強化學習 及 通用人工智慧
1
3
2
Atari Games
AlphaGo Series
ChatGPT / GPT-4
34
Evolution of Large Language Models
Ref: Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
35
Language Model and Text Generation
• • • • •
• Sampling Strategy ~ Greedy / Top-K / Top-P (Temperature)
• Next Word Prediction ~ Sequential Decision Making
Ref: “Language Modeling” from “NLP Course | For You”
36
ChatGPT Training Pipeline
Ref: “Introducing ChatGPT” from OpenAI
• Supervised Learning
• Reward Model
• Reinforcement Learning
• Supervised Fine-Tuning
(SFT)
• Reinforcement Learning
from Human Feedback
(RLHF)
37
GPT Assistant Training Pipeline
Andrej Karpathy - State of GPT / Microsoft Developer / 05.25.2023 @ Youtube
38
Reinforcement Learning from Human Feedback
(General Process)
Step 1. Rollout :
Step 2. Evaluation :
Step 3. Optimization :
Ref: Transformer Reinforcement Learning @ https://github.com/lvwerra/trl
39
Reinforcement Learning from Human Feedback
(Sentiment)
Ref: Transformer Reinforcement Learning @ https://github.com/lvwerra/trl
prompt response reward
BERT
Classifier
control
Movie Review
Dataset
Tune GPT-2 to Generate
Controlled Sentiment Reviews
train
train
40
Reinforcement Learning from Human Feedback
(Detoxification)
Ref: Using Transformer Reinforcement Learning to Detoxify Generative Language Models
prompt response reward
Detoxifying Large Language Model
train
RealToxicityPrompts
Dataset
RoBERTa
Classifier
GPT-Neo
41
GPT-4 Content Policy and Safety Challenge
Ref: GPT-4 Technical Report / System Card
42
GPT-4 Training Pipeline for Safety
Supervised Fine-Tuning (SFT)
Reinforcement Learning from Human Feedback (RLHF)
Rule-Based Reward Models (RBRMs)
• a refusal in the desired style
• a refusal in the undesired style
• containing disallowed content
• a safe non-refusal response
Ref: GPT-4 Technical Report / System Card
43
ChatGPT Hallucinations
44
GPT-4 Hallucinations and Improvements
Enhance Reward Models to mitigate
• Open-Domain Hallucinations
• Closed-Domain Hallucinations
Ref: GPT-4 Technical Report / System Card
45
Reinforcement Learning Use Cases
1. Reinforcement Learning for Quality
2. Reinforcement Learning for Safety
3. Reinforcement Learning for Hallucination
4. Reinforcement Learning for Sentiment
5. Reinforcement Learning for Detoxification
46
Summary of Five Large Language Models
Ref: “What Makes a Dialog Agent Useful” from Hugging Face
System
Pre-Trained Base Model
Supervised Fine-Tuning
Reinforcement Learning from Human Feedback
Hand Written Rules for Safety
47
深度強化學習 及 通用人工智慧
1
3
2
Atari Games
AlphaGo Series
ChatGPT / GPT-4
Q&A
從 Atari/AlphaGo/ChatGPT 談
深度強化學習 及 通用人工智慧

More Related Content

Similar to 從 Atari/AlphaGo/ChatGPT 談深度強化學習及通用人工智慧

A tale about 3rd place on "Agricultural crop cover classification challenge"
A tale about 3rd  place on  "Agricultural crop cover classification challenge"A tale about 3rd  place on  "Agricultural crop cover classification challenge"
A tale about 3rd place on "Agricultural crop cover classification challenge"Vitaly Bondar
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsTwitter Inc.
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?M Waleed Kadous
 
Puppet Contributor Summit - Gent 2015
Puppet Contributor Summit - Gent 2015Puppet Contributor Summit - Gent 2015
Puppet Contributor Summit - Gent 2015Eric Sorenson
 
Continuous Testing in Vegas
Continuous Testing in VegasContinuous Testing in Vegas
Continuous Testing in Vegasjaredrrichardson
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeAlexey Grigorev
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Pushing The Boundaries Of Continuous Integration
Pushing The Boundaries Of Continuous IntegrationPushing The Boundaries Of Continuous Integration
Pushing The Boundaries Of Continuous IntegrationRobbie Clutton
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIJim Dowling
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowDatabricks
 
從台積電看敏捷帶來的改變.pdf
從台積電看敏捷帶來的改變.pdf從台積電看敏捷帶來的改變.pdf
從台積電看敏捷帶來的改變.pdfDerek Chen
 
Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner)
Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner) Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner)
Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner) Puppet
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
 
Easy path to machine learning (Spring 2020)
Easy path to machine learning (Spring 2020)Easy path to machine learning (Spring 2020)
Easy path to machine learning (Spring 2020)wesley chun
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusDatabricks
 
[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptx
[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptx[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptx
[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptxssuserf8b8bd1
 

Similar to 從 Atari/AlphaGo/ChatGPT 談深度強化學習及通用人工智慧 (20)

A tale about 3rd place on "Agricultural crop cover classification challenge"
A tale about 3rd  place on  "Agricultural crop cover classification challenge"A tale about 3rd  place on  "Agricultural crop cover classification challenge"
A tale about 3rd place on "Agricultural crop cover classification challenge"
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual Suggestions
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
Puppet Contributor Summit - Gent 2015
Puppet Contributor Summit - Gent 2015Puppet Contributor Summit - Gent 2015
Puppet Contributor Summit - Gent 2015
 
Continuous Testing in Vegas
Continuous Testing in VegasContinuous Testing in Vegas
Continuous Testing in Vegas
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real life
 
SEP_Poster
SEP_PosterSEP_Poster
SEP_Poster
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Pushing The Boundaries Of Continuous Integration
Pushing The Boundaries Of Continuous IntegrationPushing The Boundaries Of Continuous Integration
Pushing The Boundaries Of Continuous Integration
 
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
從台積電看敏捷帶來的改變.pdf
從台積電看敏捷帶來的改變.pdf從台積電看敏捷帶來的改變.pdf
從台積電看敏捷帶來的改變.pdf
 
Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner)
Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner) Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner)
Puppet Camp Melbourne 2014: Puppet and a DevOps Journey (Beginner)
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Easy path to machine learning (Spring 2020)
Easy path to machine learning (Spring 2020)Easy path to machine learning (Spring 2020)
Easy path to machine learning (Spring 2020)
 
AlphaGo and AlphaGo Zero
AlphaGo and AlphaGo ZeroAlphaGo and AlphaGo Zero
AlphaGo and AlphaGo Zero
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptx
[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptx[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptx
[발표자료] 오픈소스 기반 고가용성 Pacemaker 소개 및 적용 사례_20230703_v1.1F.pptx
 

More from Frank Fang Kuo Yu

Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享
Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享
Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享Frank Fang Kuo Yu
 
Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享
Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享
Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享Frank Fang Kuo Yu
 
大型語言模型的幻覺和風險
大型語言模型的幻覺和風險大型語言模型的幻覺和風險
大型語言模型的幻覺和風險Frank Fang Kuo Yu
 
人工智慧圖像應用簡介
人工智慧圖像應用簡介人工智慧圖像應用簡介
人工智慧圖像應用簡介Frank Fang Kuo Yu
 
Orange Data Mining 軟體系統簡介及生醫應用支援
Orange Data Mining 軟體系統簡介及生醫應用支援Orange Data Mining 軟體系統簡介及生醫應用支援
Orange Data Mining 軟體系統簡介及生醫應用支援Frank Fang Kuo Yu
 
從開源資料集看人工智慧醫療應用
從開源資料集看人工智慧醫療應用從開源資料集看人工智慧醫療應用
從開源資料集看人工智慧醫療應用Frank Fang Kuo Yu
 
Deep Learning and Object Detection
Deep Learning and Object DetectionDeep Learning and Object Detection
Deep Learning and Object DetectionFrank Fang Kuo Yu
 
Data Science and Machine Learning in Smart manufacturing
Data Science and Machine Learning in Smart manufacturingData Science and Machine Learning in Smart manufacturing
Data Science and Machine Learning in Smart manufacturingFrank Fang Kuo Yu
 
Deep Learning and Image Recognition
Deep Learning and Image RecognitionDeep Learning and Image Recognition
Deep Learning and Image RecognitionFrank Fang Kuo Yu
 
Leap Motion Controller and Application Development
Leap Motion Controller and Application DevelopmentLeap Motion Controller and Application Development
Leap Motion Controller and Application DevelopmentFrank Fang Kuo Yu
 
Startup Ecosystem in Shanghai
Startup Ecosystem in ShanghaiStartup Ecosystem in Shanghai
Startup Ecosystem in ShanghaiFrank Fang Kuo Yu
 
Case Method at Harvard Business School
Case Method at Harvard Business SchoolCase Method at Harvard Business School
Case Method at Harvard Business SchoolFrank Fang Kuo Yu
 
如何做報告 (How to make a presentation?)
如何做報告 (How to make a presentation?)如何做報告 (How to make a presentation?)
如何做報告 (How to make a presentation?)Frank Fang Kuo Yu
 

More from Frank Fang Kuo Yu (18)

Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享
Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享
Microsoft Bing Image Creator (OpenAI DALL-E 3) 文字生成圖片經驗分享
 
Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享
Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享
Microsoft Bing Image Creator (OpenAI DALL·E) 建築景觀圖片生成經驗分享
 
大型語言模型的幻覺和風險
大型語言模型的幻覺和風險大型語言模型的幻覺和風險
大型語言模型的幻覺和風險
 
人工智慧圖像應用簡介
人工智慧圖像應用簡介人工智慧圖像應用簡介
人工智慧圖像應用簡介
 
Orange Data Mining 軟體系統簡介及生醫應用支援
Orange Data Mining 軟體系統簡介及生醫應用支援Orange Data Mining 軟體系統簡介及生醫應用支援
Orange Data Mining 軟體系統簡介及生醫應用支援
 
從開源資料集看人工智慧醫療應用
從開源資料集看人工智慧醫療應用從開源資料集看人工智慧醫療應用
從開源資料集看人工智慧醫療應用
 
Deep Learning and Object Detection
Deep Learning and Object DetectionDeep Learning and Object Detection
Deep Learning and Object Detection
 
Data Science and Machine Learning in Smart manufacturing
Data Science and Machine Learning in Smart manufacturingData Science and Machine Learning in Smart manufacturing
Data Science and Machine Learning in Smart manufacturing
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Deep Learning and Image Recognition
Deep Learning and Image RecognitionDeep Learning and Image Recognition
Deep Learning and Image Recognition
 
Leap Motion Controller and Application Development
Leap Motion Controller and Application DevelopmentLeap Motion Controller and Application Development
Leap Motion Controller and Application Development
 
創客/創業/創新
創客/創業/創新創客/創業/創新
創客/創業/創新
 
Startup Ecosystem in Shanghai
Startup Ecosystem in ShanghaiStartup Ecosystem in Shanghai
Startup Ecosystem in Shanghai
 
Case Method at Harvard Business School
Case Method at Harvard Business SchoolCase Method at Harvard Business School
Case Method at Harvard Business School
 
如何做報告 (How to make a presentation?)
如何做報告 (How to make a presentation?)如何做報告 (How to make a presentation?)
如何做報告 (How to make a presentation?)
 
Introduction to GPRS
Introduction to GPRSIntroduction to GPRS
Introduction to GPRS
 
Introduction to PPP
Introduction to PPPIntroduction to PPP
Introduction to PPP
 
Introduction to TCP/IP
Introduction to TCP/IPIntroduction to TCP/IP
Introduction to TCP/IP
 

Recently uploaded

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 

Recently uploaded (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

從 Atari/AlphaGo/ChatGPT 談深度強化學習及通用人工智慧

  • 1. 余方國 博士 06/04/2023 從 Atari/AlphaGo/ChatGPT 談 深度強化學習 及 通用人工智慧
  • 2. 2 深度強化學習 及 通用人工智慧 Artificial General Intelligence (AGI) : an agent can achieve or exceed human performance in a wide range of environments (Credit: Shane Legg and Marcus Hutter) Reinforcement Learning : decision-making framework Deep Learning : representation computation/optimization mechanism Deep Reinforcement Learning : formulate problem/solution (Credit: David Silver and Demis Hassabis)
  • 3. 3 深度強化學習 及 通用人工智慧 1 3 2 Atari Games AlphaGo Series ChatGPT / GPT-4
  • 4. 4 Atari Games Pong Breakout Phoenix https://www.gymlibrary.dev/ & https://gymnasium.farama.org/
  • 5. 5 Reinforcement Learning Framework ENVIRONMENT AGENT State Action Reward (s1 → a1 → r1)→ (s2 → a2 → r2)→ (s3 → a3 → r3)→ … Making Sequential Decisions to Maximize Long-Term Rewards
  • 6. 6 Atari Breakout in OpenAI Gym import gym env = gym.make("ALE/Breakout-v5", render_mode="human") state, info = env.reset() for index in range(1000): action = env.action_space.sample() # action by random or policy state, reward, terminated, truncated, info = env.step(action) if terminated or truncated: state, info = env.reset() env.close() https://www.gymlibrary.dev/ & https://gymnasium.farama.org/
  • 7. 7 State/Action/Reward in Atari Breakout State: ● (210, 160, 3) - image Action: ● 0 - NO OP ● 1 - FIRE ● 2 - RIGHT ● 3 - LEFT Reward: ● Red - 7 points ● Orange - 7 points ● Yellow - 4 points ● Green - 4 points ● Aqua - 1 point ● Blue - 1 point https://www.gymlibrary.dev/ & https://gymnasium.farama.org/
  • 8. 8 From One Game to All The Games in Atari https://www.gymlibrary.dev/ https://gymnasium.farama.org/
  • 9. 9 A Journey to Artificial General Intelligence https://www.assemblyai.com/blog/reinforcement-learning-with-deep-q-learning-explained/ https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark DQN/2015 R2D2/2019 NGU/2019 Agent57/2020
  • 10. 10 OpenAI Gym Taxi-v3 : State/Action/Reward State: ● Number of Variable : 1 ● Range of Variable : [1, 500] ● 25 taxi positions x 5 passenger positions x 4 destination locations Action: ● 0 : move south ● 1 : move north ● 2 : move east ● 3 : move west ● 4 : pickup passenger ● 5 : drop off passenger Reward: ● +20 : delivering passenger ● -10 : pickup/dropoff illegally ● -1 : per step unless other rewards is triggered https://www.gymlibrary.dev/environments/toy_text/taxi/
  • 11. 11 OpenAI Gym Taxi-v3 : Q Table (500 x 6) https://www.gocoder.one/blog/rl-tutorial-with-openai-gym
  • 12. 12 Q Learning (with epsilon greedy policy) 3. exploitation 1. initialize Q table 4. exploration 5. action 2. state 8. update Q table 6. next state 7. reward https://www.cs.toronto.edu/~rgrosse/courses/csc311_f21/
  • 13. 13 Limitation of Q Table representation scalability
  • 14. 14 Deep Q Network (DQN) Architecture (1/2) Ref : Human-level control through deep reinforcement learning
  • 15. 15 Deep Q Network (DQN) Architecture (2/2) Ref : Massively Parallel Methods for Deep Reinforcement Learning
  • 16. 16 Deep Q Learning (with experience replay and dual networks) 1. initialize replay memory 5. store transition in replay memory 6. get batch from replay memory 2. initialize main network 3. initialize target network 4. epsilon greedy policy from main network 7. calculate error between two networks 8. synchronize two networks Ref : Human-level control through deep reinforcement learning
  • 17. 17 Deep Q Network (DQN) Benchmark Ref : Human-level control through deep reinforcement learning
  • 18. 18 Four Tough Games in Atari Pitfall Solaris Skiing Montezuma’s Revenge Problems : long-term credit assignment and exploitation/exploration tradeoff Solutions : intrinsic motivation, meta-controller, short-term/episodic memory, distributed agents, etc. https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
  • 19. 19 Policy Gradient on Atari Pong https://www.youtube.com/watch?v=tqrcjHuNdmQ
  • 21. 21 深度強化學習 及 通用人工智慧 1 3 2 Atari Games AlphaGo Series ChatGPT / GPT-4
  • 22. 22 A Journey to Artificial General Intelligence https://www.deepmind.com/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules https://www.youtube.com/watch?v=lVMgxtm5L-U
  • 23. 23 AlphaGo, AlphaGo Zero, Alpha Zero, MuZero AlphaGo Zero, Nature, 2017 AlphaZero, Science, 2018 MuZero, Nature, 2020 AlphaGo, Nature, 2016
  • 24. 24 AlphaGo Fan/Lee/Master ● European Go Champion Fan Hui — 5:0 ● South Korean professional Go player Lee Sedol — 4:1 ● Online games with players from China/Korea/Japan — 60:0 ● Chinese professional Go player Ke Jie — 3:0 https://www.youtube.com/watch?v=lVMgxtm5L-U https://www.youtube.com/watch?v=WXuK6gekU1Y
  • 25. 25 AlphaGo Inputs and Policy/Value Networks https://www.slideshare.net/ckmarkohchang/alphago-in-depth
  • 26. 26 AlphaGo Monte Carlo Tree Search https://www.slideshare.net/ckmarkohchang/alphago-in-depth
  • 28. 28 AlphaZero Network Ref: Acquisition of Chess Knowledge in AlphaZero AlphaGo • Two networks: policy network and value network • Conv/ReLu-based layer structure AlphaZero • One network with two heads: policy and value • ResNet-based layer structure
  • 29. 29 AlphaGo Zero Performance Benchmark https://thirdeyedata.ai/how-to-build-your-own-alphazero-ai-using-python-and-keras/
  • 30. 30 MuZero Training Process h: representation f: prediction g: dynamics Ref: Mastering Atari, Go, chess and shogi by planning with a learned model
  • 31. 31 MuZero Performance Benchmark Ref: Mastering Atari, Go, chess and shogi by planning with a learned model
  • 32. 32 AlphaGo to AlphaStar by David Silver Deep Reinforcement Learning from AlphaGo to AlphaStar - London Machine Learning Meetup
  • 33. 33 深度強化學習 及 通用人工智慧 1 3 2 Atari Games AlphaGo Series ChatGPT / GPT-4
  • 34. 34 Evolution of Large Language Models Ref: Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
  • 35. 35 Language Model and Text Generation • • • • • • Sampling Strategy ~ Greedy / Top-K / Top-P (Temperature) • Next Word Prediction ~ Sequential Decision Making Ref: “Language Modeling” from “NLP Course | For You”
  • 36. 36 ChatGPT Training Pipeline Ref: “Introducing ChatGPT” from OpenAI • Supervised Learning • Reward Model • Reinforcement Learning • Supervised Fine-Tuning (SFT) • Reinforcement Learning from Human Feedback (RLHF)
  • 37. 37 GPT Assistant Training Pipeline Andrej Karpathy - State of GPT / Microsoft Developer / 05.25.2023 @ Youtube
  • 38. 38 Reinforcement Learning from Human Feedback (General Process) Step 1. Rollout : Step 2. Evaluation : Step 3. Optimization : Ref: Transformer Reinforcement Learning @ https://github.com/lvwerra/trl
  • 39. 39 Reinforcement Learning from Human Feedback (Sentiment) Ref: Transformer Reinforcement Learning @ https://github.com/lvwerra/trl prompt response reward BERT Classifier control Movie Review Dataset Tune GPT-2 to Generate Controlled Sentiment Reviews train train
  • 40. 40 Reinforcement Learning from Human Feedback (Detoxification) Ref: Using Transformer Reinforcement Learning to Detoxify Generative Language Models prompt response reward Detoxifying Large Language Model train RealToxicityPrompts Dataset RoBERTa Classifier GPT-Neo
  • 41. 41 GPT-4 Content Policy and Safety Challenge Ref: GPT-4 Technical Report / System Card
  • 42. 42 GPT-4 Training Pipeline for Safety Supervised Fine-Tuning (SFT) Reinforcement Learning from Human Feedback (RLHF) Rule-Based Reward Models (RBRMs) • a refusal in the desired style • a refusal in the undesired style • containing disallowed content • a safe non-refusal response Ref: GPT-4 Technical Report / System Card
  • 44. 44 GPT-4 Hallucinations and Improvements Enhance Reward Models to mitigate • Open-Domain Hallucinations • Closed-Domain Hallucinations Ref: GPT-4 Technical Report / System Card
  • 45. 45 Reinforcement Learning Use Cases 1. Reinforcement Learning for Quality 2. Reinforcement Learning for Safety 3. Reinforcement Learning for Hallucination 4. Reinforcement Learning for Sentiment 5. Reinforcement Learning for Detoxification
  • 46. 46 Summary of Five Large Language Models Ref: “What Makes a Dialog Agent Useful” from Hugging Face System Pre-Trained Base Model Supervised Fine-Tuning Reinforcement Learning from Human Feedback Hand Written Rules for Safety
  • 47. 47 深度強化學習 及 通用人工智慧 1 3 2 Atari Games AlphaGo Series ChatGPT / GPT-4