SlideShare a Scribd company logo
1 of 10
Reinforcement Learning
Ilfan taufik
23217132
Background and Definition
• Reinforcement learning (RL) is an area of machine learning inspired by
behaviourist psychology, concerned with how software agents ought
to take actions in an environment so as to maximize some notion of
cumulative reward (Wikipedia)
• Reinforcement learning copies a very simple principle from nature
that The psychologist Edward Thorndike documented it more than
100 years ago. Thorndike placed cats inside boxes from which they
could escape only by pressing a lever. After a considerable amount of
pacing around and meowing, the animals would eventually step on
the lever by chance. After they learned to associate this behavior with
the desired outcome, they eventually escaped with increasing speed.
• In 1951, Marvin Minsky, a student at Harvard who would become
one of the founding fathers of AI as a professor at MIT, built a
machine that used a simple form of reinforcement learning to mimic
a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy
Reinforcement Computer, or SNARC, consisted of dozens of tubes,
motors, and clutches that simulated the behavior of 40 neurons and
synapses. As a simulated rat made its way out of a virtual maze, the
strength of some synaptic connections would increase, thereby
reinforcing the underlying behavior.
Will Knight@technologyreview.com
• By experimenting, computers are figuring out how to do things that
no programmer could teach them
• how to get a computer to calculate the value that should be assigned
to, say, each right or wrong turn that a rat might make on its way out
of its maze
Algoritma
• Q-learning works by learning value of action and state so that to
choose optimal solution just by choosing maximal value of action-
state for each state.
• Q-learning - Q-learning could give an optimal solution in Markov
Decision Process
• Markov Decision Process (MDP)-is a mathemahical framework that
developed by andrey markov to modellling a sistem of decision
making
interested fact
• DeepMind combined Deep Learning & Reinforcement Learning to
create the first artificial agents to achieve human-level performance
across many challenging domains
So that
• in March 2016, AlphaGo, a program trained using reinforcement
learning, destroyed one of the best Go players of all time, South
Korea’s Lee Sedol
Application
• ALPHAGO BY DEEPMIND
• SELF DRIVING CARS
• BOT DOTA –OpenAI – ellon musk
Reference
• https://deepmind.com/blog/deep-reinforcement-learning/
• https://www.technologyreview.com/s/603501/10-breakthrough-
technologies-2017-reinforcement-learning/
• Sadewa, Calvin “Pengaplikasian approximate dynamic programming
dalam permasalahan perekomendasiaan dinamis”

More Related Content

Similar to Reinforcement learning

artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
Mayank Saxena
 
ARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdfARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdf
ssusere55750
 
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptxcsc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
AlexKaul1
 
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
RudrakshAmar
 

Similar to Reinforcement learning (20)

AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Big Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learningBig Sky Earth 2018 Introduction to machine learning
Big Sky Earth 2018 Introduction to machine learning
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
 
Artificial Intelligence by Jayant
Artificial Intelligence by JayantArtificial Intelligence by Jayant
Artificial Intelligence by Jayant
 
VET4SBO Level 2 module 2 - unit 2 - v1.0 en
VET4SBO Level 2   module 2 - unit 2 - v1.0 enVET4SBO Level 2   module 2 - unit 2 - v1.0 en
VET4SBO Level 2 module 2 - unit 2 - v1.0 en
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Introduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learningIntroduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learning
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
How to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaHow to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? Edureka
 
Artificial intelligence and its application
Artificial intelligence and its applicationArtificial intelligence and its application
Artificial intelligence and its application
 
20106959 artificial-intelligence
20106959 artificial-intelligence20106959 artificial-intelligence
20106959 artificial-intelligence
 
Unit 2 ai
Unit 2 aiUnit 2 ai
Unit 2 ai
 
ARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdfARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdf
 
ARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdfARTIFICIAL INTELLIGENCEr.pdf
ARTIFICIAL INTELLIGENCEr.pdf
 
Introduction to Soft Computing by Dr.S.Jagadeesh Kumar
Introduction to Soft Computing by Dr.S.Jagadeesh KumarIntroduction to Soft Computing by Dr.S.Jagadeesh Kumar
Introduction to Soft Computing by Dr.S.Jagadeesh Kumar
 
alexVAE_New.pdf
alexVAE_New.pdfalexVAE_New.pdf
alexVAE_New.pdf
 
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptxcsc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
csc384-Lecture01-Introduction_abcdpdf_pdf_to_ppt.pptx
 
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
AI in 6 Hours this pdf contains a general idea of how AI will be asked in the...
 
게임 AI를 통해 본 인공지능 기본 개념
게임 AI를 통해 본 인공지능 기본 개념게임 AI를 통해 본 인공지능 기본 개념
게임 AI를 통해 본 인공지능 기본 개념
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 

Reinforcement learning

  • 2. Background and Definition • Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward (Wikipedia) • Reinforcement learning copies a very simple principle from nature that The psychologist Edward Thorndike documented it more than 100 years ago. Thorndike placed cats inside boxes from which they could escape only by pressing a lever. After a considerable amount of pacing around and meowing, the animals would eventually step on the lever by chance. After they learned to associate this behavior with the desired outcome, they eventually escaped with increasing speed.
  • 3. • In 1951, Marvin Minsky, a student at Harvard who would become one of the founding fathers of AI as a professor at MIT, built a machine that used a simple form of reinforcement learning to mimic a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy Reinforcement Computer, or SNARC, consisted of dozens of tubes, motors, and clutches that simulated the behavior of 40 neurons and synapses. As a simulated rat made its way out of a virtual maze, the strength of some synaptic connections would increase, thereby reinforcing the underlying behavior.
  • 4.
  • 5. Will Knight@technologyreview.com • By experimenting, computers are figuring out how to do things that no programmer could teach them • how to get a computer to calculate the value that should be assigned to, say, each right or wrong turn that a rat might make on its way out of its maze
  • 6. Algoritma • Q-learning works by learning value of action and state so that to choose optimal solution just by choosing maximal value of action- state for each state. • Q-learning - Q-learning could give an optimal solution in Markov Decision Process • Markov Decision Process (MDP)-is a mathemahical framework that developed by andrey markov to modellling a sistem of decision making
  • 7.
  • 8. interested fact • DeepMind combined Deep Learning & Reinforcement Learning to create the first artificial agents to achieve human-level performance across many challenging domains So that • in March 2016, AlphaGo, a program trained using reinforcement learning, destroyed one of the best Go players of all time, South Korea’s Lee Sedol
  • 9. Application • ALPHAGO BY DEEPMIND • SELF DRIVING CARS • BOT DOTA –OpenAI – ellon musk

Editor's Notes

  1. Reinforce learning adalah sebuah area dari machine learning yang terinspirasi oleh prilaku mahluk hidup seperti yang telah didokumentasikan oleh edwar thorndike, dimana dia menempatkan seekor kucing dalam sebuah kendang yang bisa saja terbuka dengan menekan sebuah tuas. Setelah dilakukan beberapa percobaan dari kucing yang sama maka kucing itu makin cepat keluar dari kandang
  2. Penerapan pada AI dilakukan Pada tahun 1951 marvin Minsky dengan perangkatnya yang bernama Minsky’ stochastic neural analogy reinforcement computer atau snarc, membuat sebuah simulasi pembelajaran tikus dalam menyelesaikan maze
  3. Algorima Q – learning to learn to play games on the Atari 2600 console
  4. Deep-learning --The basic idea—that software can simulate the neocortex’s large array of neurons in an artificial “neural network”