SlideShare a Scribd company logo
Reinforcement Learning
Ilfan taufik
23217132
Background and Definition
• Reinforcement learning (RL) is an area of machine learning inspired by
behaviourist psychology, concerned with how software agents ought
to take actions in an environment so as to maximize some notion of
cumulative reward (Wikipedia)
• Reinforcement learning copies a very simple principle from nature
that The psychologist Edward Thorndike documented it more than
100 years ago. Thorndike placed cats inside boxes from which they
could escape only by pressing a lever. After a considerable amount of
pacing around and meowing, the animals would eventually step on
the lever by chance. After they learned to associate this behavior with
the desired outcome, they eventually escaped with increasing speed.
• In 1951, Marvin Minsky, a student at Harvard who would become
one of the founding fathers of AI as a professor at MIT, built a
machine that used a simple form of reinforcement learning to mimic
a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy
Reinforcement Computer, or SNARC, consisted of dozens of tubes,
motors, and clutches that simulated the behavior of 40 neurons and
synapses. As a simulated rat made its way out of a virtual maze, the
strength of some synaptic connections would increase, thereby
reinforcing the underlying behavior.
Will Knight@technologyreview.com
• By experimenting, computers are figuring out how to do things that
no programmer could teach them
• how to get a computer to calculate the value that should be assigned
to, say, each right or wrong turn that a rat might make on its way out
of its maze
Algoritma
• Q-learning works by learning value of action and state so that to
choose optimal solution just by choosing maximal value of action-
state for each state.
• Q-learning - Q-learning could give an optimal solution in Markov
Decision Process
• Markov Decision Process (MDP)-is a mathemahical framework that
developed by andrey markov to modellling a sistem of decision
making
interested fact
• DeepMind combined Deep Learning & Reinforcement Learning to
create the first artificial agents to achieve human-level performance
across many challenging domains
So that
• in March 2016, AlphaGo, a program trained using reinforcement
learning, destroyed one of the best Go players of all time, South
Korea’s Lee Sedol
Application
• ALPHAGO BY DEEPMIND
• SELF DRIVING CARS
• BOT DOTA –OpenAI – ellon musk
Reference
• https://deepmind.com/blog/deep-reinforcement-learning/
• https://www.technologyreview.com/s/603501/10-breakthrough-
technologies-2017-reinforcement-learning/
• Sadewa, Calvin “Pengaplikasian approximate dynamic programming
dalam permasalahan perekomendasiaan dinamis”

More Related Content

Similar to Reinforcement learning

artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
Mayank Saxena
 
ARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdfARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdf
ssusere55750
 
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptxcsc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
AlexKaul1
 
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
RudrakshAmar
 

Similar to Reinforcement learning (20)

AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Big Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learningBig Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learning
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
 
Artificial Intelligence by Jayant
Artificial Intelligence by JayantArtificial Intelligence by Jayant
Artificial Intelligence by Jayant
 
VET4SBO Level 2 module 2 - unit 2 - v1.0 en
VET4SBO Level 2   module 2 - unit 2 - v1.0 enVET4SBO Level 2   module 2 - unit 2 - v1.0 en
VET4SBO Level 2 module 2 - unit 2 - v1.0 en
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Introduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learningIntroduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learning
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
How to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaHow to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? Edureka
 
Artificial intelligence and its application
Artificial intelligence and its applicationArtificial intelligence and its application
Artificial intelligence and its application
 
20106959 artificial-intelligence
20106959 artificial-intelligence20106959 artificial-intelligence
20106959 artificial-intelligence
 
Unit 2 ai
Unit 2 aiUnit 2 ai
Unit 2 ai
 
ARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdfARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdf
 
ARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdfARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdf
 
Introduction to Soft Computing by Dr.S.Jagadeesh Kumar
Introduction to Soft Computing by Dr.S.Jagadeesh KumarIntroduction to Soft Computing by Dr.S.Jagadeesh Kumar
Introduction to Soft Computing by Dr.S.Jagadeesh Kumar
 
alexVAE_New.pdf
alexVAE_New.pdfalexVAE_New.pdf
alexVAE_New.pdf
 
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptxcsc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
 
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
 
게임 AI를 통해 본 인공지능 기본 개념
게임 AI를 통해 본 인공지능 기본 개념게임 AI를 통해 본 인공지능 기본 개념
게임 AI를 통해 본 인공지능 기본 개념
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Reinforcement learning

  • 2. Background and Definition • Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward (Wikipedia) • Reinforcement learning copies a very simple principle from nature that The psychologist Edward Thorndike documented it more than 100 years ago. Thorndike placed cats inside boxes from which they could escape only by pressing a lever. After a considerable amount of pacing around and meowing, the animals would eventually step on the lever by chance. After they learned to associate this behavior with the desired outcome, they eventually escaped with increasing speed.
  • 3. • In 1951, Marvin Minsky, a student at Harvard who would become one of the founding fathers of AI as a professor at MIT, built a machine that used a simple form of reinforcement learning to mimic a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy Reinforcement Computer, or SNARC, consisted of dozens of tubes, motors, and clutches that simulated the behavior of 40 neurons and synapses. As a simulated rat made its way out of a virtual maze, the strength of some synaptic connections would increase, thereby reinforcing the underlying behavior.
  • 4.
  • 5. Will Knight@technologyreview.com • By experimenting, computers are figuring out how to do things that no programmer could teach them • how to get a computer to calculate the value that should be assigned to, say, each right or wrong turn that a rat might make on its way out of its maze
  • 6. Algoritma • Q-learning works by learning value of action and state so that to choose optimal solution just by choosing maximal value of action- state for each state. • Q-learning - Q-learning could give an optimal solution in Markov Decision Process • Markov Decision Process (MDP)-is a mathemahical framework that developed by andrey markov to modellling a sistem of decision making
  • 7.
  • 8. interested fact • DeepMind combined Deep Learning & Reinforcement Learning to create the first artificial agents to achieve human-level performance across many challenging domains So that • in March 2016, AlphaGo, a program trained using reinforcement learning, destroyed one of the best Go players of all time, South Korea’s Lee Sedol
  • 9. Application • ALPHAGO BY DEEPMIND • SELF DRIVING CARS • BOT DOTA –OpenAI – ellon musk

Editor's Notes

  1. Reinforce learning adalah sebuah area dari machine learning yang terinspirasi oleh prilaku mahluk hidup seperti yang telah didokumentasikan oleh edwar thorndike, dimana dia menempatkan seekor kucing dalam sebuah kendang yang bisa saja terbuka dengan menekan sebuah tuas. Setelah dilakukan beberapa percobaan dari kucing yang sama maka kucing itu makin cepat keluar dari kandang
  2. Penerapan pada AI dilakukan Pada tahun 1951 marvin Minsky dengan perangkatnya yang bernama Minsky’ stochastic neural analogy reinforcement computer atau snarc, membuat sebuah simulasi pembelajaran tikus dalam menyelesaikan maze
  3. Algorima Q – learning to learn to play games on the Atari 2600 console
  4. Deep-learning --The basic idea—that software can simulate the neocortex’s large array of neurons in an artificial “neural network”