SlideShare a Scribd company logo
CSCI S-89C Deep Reinforcement Learning
Syllabus Spring 2021
Lectures: Online web conference, Wednesdays, 7:40-9:40 pm
Lectures will be live-streamed with the video being available via the course website within 24 hours.
Instructor: Dr. Dmitry Kurochkin, Senior Research Analyst, Harvard University
E-mail: dkurochkin@fas.harvard.edu
Website: https://canvas.harvard.edu/courses/81664
Office Hours: By request
Teaching Fellows: TBA e-mail: TBA
Prerequisites:
Introductory probability and statistics, multivariate calculus equivalent to MATH E-21a, and profi-
ciency in Python programming equivalent to CSCI E-7.
Note on the prerequisites:
We will be formulating value (cost) functions and performing optimization. Students are expected to be
comfortable taking derivatives. Basic knowledge of probability theory (in particular, conditional proba-
bility distributions and conditional expectations) is necessary. Understanding matrix vector operations
and notation is helpful but not required. All coding exercises are performed in Python. Students are
required to take a short pretest at the beginning of the course. The pretest score will not count toward
the final grade but will help you understand whether your background in calculus, probability theory,
as well as command of coding positions you for success in this course.
Text:
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd ed.
ISBN: 978-0-262-03924-6
Electronic copy of the book is available at the author’s webpage (under “Full Pdf”)
http://incompleteideas.net/book/the-book-2nd.html
Optional reading:
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016
ISBN: 978-0-262-03561-3
HTML version of the book is available at http://www.deeplearningbook.org
Course Description:
This course introduces Deep Reinforcement Learning (RL), one of the most modern techniques of ma-
chine learning. Deep RL has attracted the attention of many researches and developers in recent years
due to its wide range of applications in a variety of fields such as robotics, robotic surgery, pattern
recognition, diagnosis based on medical image, treatment strategies in clinical decision making, person-
alized medical treatment, drug discovery, speech recognition, computer vision, and natural language
processing. Deep RL is often seen as the third area of machine learning, in addition to supervised and
unsupervised algorithms, in which learning of an agent occurs as a result of its own actions and inter-
action with the environment. Generally, such learning processes do not need to be guided externally,
but it has been difficult until recently to use RL ideas practically. This course primarily focuses on
problems that emerge in healthcare and life science applications.
Tentative List of Topics:
I. Reinforcement Learning (RL)
◦ Markov Decision Processes (MDP): Value Functions and Policies
1
◦ Dynamic Programming (DP): Bellman Equation
◦ Monte Carlo (MC) Methods
◦ Temporal-difference (TD) Prediction and Control: SARSA and Q-learning
◦ n-step TD
◦ Approximation Methods: Stochastic-gradient, Semi-gradient TD Update, Least-squares TD
II. Deep Learning
◦ Neural Networks (NN): Classification & Regression
◦ Training NNs: Backpropagation
◦ Tuning NNs: Regularization
◦ Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)
III. Deep RL
◦ Value-based Deep RL: Q-network
◦ Policy-based Deep RL: REINFORCE
◦ Asynchronous Methods for Deep RL: Advantage Actor-Critic (A2C)
◦ Model-based Deep RL
Homework:
Except when especially noted, homework assignments will be due each Sunday. The assignments will
be posted on Canvas website and will consist of series of programming exercises (solutions should be
implemented in Python) as well as analytical problems (knowledge of calculus and probability theory
should suffice) that help students enhance their understanding of the underlying theory. Solutions to
the programming exercises should be submitted via Canvas in a form of a single .ipynb (Jupyter Note-
book) file. The solutions to the theoretical problems should be submitted in a form of a single PDF
file.
Note on the deadline and penalty:
Solutions to the assignments submitted later than 1, 2, 3, 4, and 5 days after the due date will be
penalized by 10%, 20%, 30%, 40%, and 100%, respectively. In case you need an extension, please
coordinate with the instructor prior to the due day.
Quizzes:
An online quiz will be due before each class, unless announced otherwise. The quiz will consist of ap-
proximately 5 basic questions on understanding of studied principals. No late quizzes will be allowed.
Midterm Exam:
The midterm exam will be released on March 10 (no lecture on March 10) and due March 17 at 7:40
pm (Eastern Time). The test will be similar to Homework exercises but cover topics studied up to this
date. Late midterm will not be accepted.
Final:
The final examination will be due at 11:59 pm (Eastern Time) on May 12 (no lecture on May 12). The
exam will be cumulative covering all topics studied. Late final will not be accepted.
Attendance:
Regular attendance (whether on campus or online) is expected but will not be taken. Recorded lectures
will be available via the course website within 24 hours after the lecture.
2
Participation:
Although no credit is allocated for participation, everyone is encouraged to constructively participate
in class by asking relevant questions. It is important that you check the e-mail registered with Canvas
regularly and monitor course announcements and also participate in discussions on Piazza, the fo-
rum available at https://piazza.com/class/kh5mr9vj75c2ah. All technical and data science related
questions will be discussed on Piazza.
Grading:
The semester average is calculated using the formula:
Grade = 0.25 · Homework + 0.20 · Quizzes + 0.25 · Midterm + 0.30 · Final
Student Learning Objectives:
◦ proficiency in building optimal NNs using Python
◦ understanding of RL including MDP, Bellman equation, and optimal policy
◦ firm understanding of Deep RL and getting comfortable with approximation methods used in
conjunction with RL
◦ hands-on experience on estimating the optimal policy and value functions
Academic Integrity:
You are responsible for understanding Harvard Extension School policies on academic integrity (www.
extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to
use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, sub-
mitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses.
There are no excuses for failure to uphold academic integrity. To support your learning about academic
citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism (www.extension.
harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you’ll find links to
the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of
academic citation policy. The tutorials are anonymous open-learning tools.
Disability Accommodations:
The Extension School is committed to providing an accessible academic community. The Accessibil-
ity Office offers a variety of accommodations and services to students with documented disabilities.
More information can be found at www.extension.harvard.edu/resources-policies/resources/
accessibility-student-services
Dates of Interest:
◦ Harvard Extension School classes begin, January 25, 2021
◦ Pretest is due, January 29
◦ Last day to change the credit status, January 31
◦ Course drop deadline for full-tuition refund, January 31
◦ Quiz 1 is due, February 3
◦ Assignment 1 is due, February 7
◦ Course drop deadline for half-tuition refund, February 7
◦ Midterm Exam is due, March 17, 7:40 pm (Eastern Time)
◦ Withdrawal deadline, April 23
◦ Final Exam is due, May 12, 11:59 pm (Eastern Time)
3

More Related Content

What's hot

Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course   Does It Really...Strijker, A. (2005 12 06). Piloting Sakai In A Master Course   Does It Really...
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
Saxion
 
Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012Kathy Favazza
 
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Bridge TEFL IDELT Official Transcript of Deepak (Danny) SinghBridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Danny Singh, M.B.A., MSEd
 
Melbourne t1 2016-assignment_2_mn504
Melbourne   t1 2016-assignment_2_mn504Melbourne   t1 2016-assignment_2_mn504
Melbourne t1 2016-assignment_2_mn504
Sandeep Ratnam
 
Involving students in Semester of Code: experiences and issues from the first...
Involving students in Semester of Code: experiences and issues from the first...Involving students in Semester of Code: experiences and issues from the first...
Involving students in Semester of Code: experiences and issues from the first...
Grial - University of Salamanca
 
Course-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course AuthoringCourse-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course Authoring
Peter Brusilovsky
 
Open Education: the MOOC Experience
Open Education: the MOOC ExperienceOpen Education: the MOOC Experience
Open Education: the MOOC Experience
Rémi Bachelet
 
Transformation of the Ol Instructor
Transformation of the Ol InstructorTransformation of the Ol Instructor
Transformation of the Ol Instructor
wilkinwm
 
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
BlackboardEMEA
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
eMadrid network
 
Teaching FEM software in formal and non-formal environment with MOOCs
Teaching FEM software in formal and non-formal environment with MOOCsTeaching FEM software in formal and non-formal environment with MOOCs
Teaching FEM software in formal and non-formal environment with MOOCs
Technological Ecosystems for Enhancing Multiculturality
 
MCO 436 syllabus
MCO 436 syllabus MCO 436 syllabus
MCO 436 syllabus
Kyounghee Hazel Kwon
 
Csci e46-syllabus-spring19-v1-2
Csci e46-syllabus-spring19-v1-2Csci e46-syllabus-spring19-v1-2
Csci e46-syllabus-spring19-v1-2
BSD Certification Group
 
Ls 12orientation
Ls 12orientationLs 12orientation
Ls 12orientationJim Walker
 
Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS
University of Groningen (The Netherlands)
 
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Traian Rebedea
 
WP2 Course Modernisation
WP2 Course ModernisationWP2 Course Modernisation
WP2 Course Modernisation
metamath
 
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
«Innovations in pedagogy using MOOCs» /  Ting-Chuen Pong, professor of Comput...«Innovations in pedagogy using MOOCs» /  Ting-Chuen Pong, professor of Comput...
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
eMadrid network
 
Web-based Virtual Reality development in classroom: From learner's perspectives
Web-based Virtual Reality development in classroom: From learner's perspectivesWeb-based Virtual Reality development in classroom: From learner's perspectives
Web-based Virtual Reality development in classroom: From learner's perspectives
VinhNguyen628
 

What's hot (20)

Speaker 11 jim o'dwyer
Speaker 11 jim o'dwyerSpeaker 11 jim o'dwyer
Speaker 11 jim o'dwyer
 
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course   Does It Really...Strijker, A. (2005 12 06). Piloting Sakai In A Master Course   Does It Really...
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
 
Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012
 
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Bridge TEFL IDELT Official Transcript of Deepak (Danny) SinghBridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
 
Melbourne t1 2016-assignment_2_mn504
Melbourne   t1 2016-assignment_2_mn504Melbourne   t1 2016-assignment_2_mn504
Melbourne t1 2016-assignment_2_mn504
 
Involving students in Semester of Code: experiences and issues from the first...
Involving students in Semester of Code: experiences and issues from the first...Involving students in Semester of Code: experiences and issues from the first...
Involving students in Semester of Code: experiences and issues from the first...
 
Course-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course AuthoringCourse-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course Authoring
 
Open Education: the MOOC Experience
Open Education: the MOOC ExperienceOpen Education: the MOOC Experience
Open Education: the MOOC Experience
 
Transformation of the Ol Instructor
Transformation of the Ol InstructorTransformation of the Ol Instructor
Transformation of the Ol Instructor
 
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
 
Teaching FEM software in formal and non-formal environment with MOOCs
Teaching FEM software in formal and non-formal environment with MOOCsTeaching FEM software in formal and non-formal environment with MOOCs
Teaching FEM software in formal and non-formal environment with MOOCs
 
MCO 436 syllabus
MCO 436 syllabus MCO 436 syllabus
MCO 436 syllabus
 
Csci e46-syllabus-spring19-v1-2
Csci e46-syllabus-spring19-v1-2Csci e46-syllabus-spring19-v1-2
Csci e46-syllabus-spring19-v1-2
 
Ls 12orientation
Ls 12orientationLs 12orientation
Ls 12orientation
 
Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS
 
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
 
WP2 Course Modernisation
WP2 Course ModernisationWP2 Course Modernisation
WP2 Course Modernisation
 
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
«Innovations in pedagogy using MOOCs» /  Ting-Chuen Pong, professor of Comput...«Innovations in pedagogy using MOOCs» /  Ting-Chuen Pong, professor of Comput...
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
 
Web-based Virtual Reality development in classroom: From learner's perspectives
Web-based Virtual Reality development in classroom: From learner's perspectivesWeb-based Virtual Reality development in classroom: From learner's perspectives
Web-based Virtual Reality development in classroom: From learner's perspectives
 

Similar to Deep reinforcement learning

ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6Shaun Kellogg
 
Assignments .30%
Assignments .30%Assignments .30%
Assignments .30%butest
 
OutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docxOutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docx
gerardkortney
 
Robotics Syllabus 2016 2017
Robotics Syllabus 2016 2017Robotics Syllabus 2016 2017
Robotics Syllabus 2016 2017
Justin Joslin
 
Hybrid Statistics Course Development
Hybrid Statistics Course DevelopmentHybrid Statistics Course Development
Hybrid Statistics Course Development
Ross Flek
 
Syllabus
SyllabusSyllabus
Syllabus
Leon Adams
 
8th sem (1)
8th sem (1)8th sem (1)
8th sem (1)
IdiotJackveer
 
2009-06-15 Marist Summer Series
2009-06-15 Marist Summer Series2009-06-15 Marist Summer Series
2009-06-15 Marist Summer Series
Shawn Wells
 
Scripting for Design
Scripting for DesignScripting for Design
Scripting for Design
Kopi Maheswaran
 
Cwmd 2601 2020
Cwmd 2601 2020Cwmd 2601 2020
Cwmd 2601 2020
Kopi Maheswaran
 
Discrete-Mathematics syllabus sample.docx
Discrete-Mathematics syllabus sample.docxDiscrete-Mathematics syllabus sample.docx
Discrete-Mathematics syllabus sample.docx
LaizaMaeRodriguezAgn
 
Introduction to EMA highlights
Introduction to EMA highlightsIntroduction to EMA highlights
Introduction to EMA highlights
Nick Bunyan
 
ISSC362Course SummaryCourse ISSC362 Title IT Securit
ISSC362Course SummaryCourse  ISSC362 Title  IT SecuritISSC362Course SummaryCourse  ISSC362 Title  IT Securit
ISSC362Course SummaryCourse ISSC362 Title IT Securit
TatianaMajor22
 
Chm1083dfghj
Chm1083dfghjChm1083dfghj
Chm1083dfghj
Arriahn Foronda
 
Res1 Methods of Research Outline
Res1 Methods of Research OutlineRes1 Methods of Research Outline
Res1 Methods of Research Outline
Holy Angel University
 
MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]Maurice Dawson
 
Computational thinking
Computational thinkingComputational thinking
Computational thinking
Ngonidzashe Zanamwe
 
Ngs Hsm 700bl Module 1 01272009
Ngs Hsm 700bl Module 1 01272009Ngs Hsm 700bl Module 1 01272009
Ngs Hsm 700bl Module 1 01272009
Peter Stinson
 
1 Saint Leo University GBA 334 Applied Decision.docx
 1 Saint Leo University  GBA 334  Applied Decision.docx 1 Saint Leo University  GBA 334  Applied Decision.docx
1 Saint Leo University GBA 334 Applied Decision.docx
aryan532920
 
Stem 2 syllabus
Stem 2 syllabusStem 2 syllabus
Stem 2 syllabus
Timothy Welsh
 

Similar to Deep reinforcement learning (20)

ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6
 
Assignments .30%
Assignments .30%Assignments .30%
Assignments .30%
 
OutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docxOutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docx
 
Robotics Syllabus 2016 2017
Robotics Syllabus 2016 2017Robotics Syllabus 2016 2017
Robotics Syllabus 2016 2017
 
Hybrid Statistics Course Development
Hybrid Statistics Course DevelopmentHybrid Statistics Course Development
Hybrid Statistics Course Development
 
Syllabus
SyllabusSyllabus
Syllabus
 
8th sem (1)
8th sem (1)8th sem (1)
8th sem (1)
 
2009-06-15 Marist Summer Series
2009-06-15 Marist Summer Series2009-06-15 Marist Summer Series
2009-06-15 Marist Summer Series
 
Scripting for Design
Scripting for DesignScripting for Design
Scripting for Design
 
Cwmd 2601 2020
Cwmd 2601 2020Cwmd 2601 2020
Cwmd 2601 2020
 
Discrete-Mathematics syllabus sample.docx
Discrete-Mathematics syllabus sample.docxDiscrete-Mathematics syllabus sample.docx
Discrete-Mathematics syllabus sample.docx
 
Introduction to EMA highlights
Introduction to EMA highlightsIntroduction to EMA highlights
Introduction to EMA highlights
 
ISSC362Course SummaryCourse ISSC362 Title IT Securit
ISSC362Course SummaryCourse  ISSC362 Title  IT SecuritISSC362Course SummaryCourse  ISSC362 Title  IT Securit
ISSC362Course SummaryCourse ISSC362 Title IT Securit
 
Chm1083dfghj
Chm1083dfghjChm1083dfghj
Chm1083dfghj
 
Res1 Methods of Research Outline
Res1 Methods of Research OutlineRes1 Methods of Research Outline
Res1 Methods of Research Outline
 
MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]
 
Computational thinking
Computational thinkingComputational thinking
Computational thinking
 
Ngs Hsm 700bl Module 1 01272009
Ngs Hsm 700bl Module 1 01272009Ngs Hsm 700bl Module 1 01272009
Ngs Hsm 700bl Module 1 01272009
 
1 Saint Leo University GBA 334 Applied Decision.docx
 1 Saint Leo University  GBA 334  Applied Decision.docx 1 Saint Leo University  GBA 334  Applied Decision.docx
1 Saint Leo University GBA 334 Applied Decision.docx
 
Stem 2 syllabus
Stem 2 syllabusStem 2 syllabus
Stem 2 syllabus
 

Recently uploaded

ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 

Recently uploaded (20)

ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 

Deep reinforcement learning

  • 1. CSCI S-89C Deep Reinforcement Learning Syllabus Spring 2021 Lectures: Online web conference, Wednesdays, 7:40-9:40 pm Lectures will be live-streamed with the video being available via the course website within 24 hours. Instructor: Dr. Dmitry Kurochkin, Senior Research Analyst, Harvard University E-mail: dkurochkin@fas.harvard.edu Website: https://canvas.harvard.edu/courses/81664 Office Hours: By request Teaching Fellows: TBA e-mail: TBA Prerequisites: Introductory probability and statistics, multivariate calculus equivalent to MATH E-21a, and profi- ciency in Python programming equivalent to CSCI E-7. Note on the prerequisites: We will be formulating value (cost) functions and performing optimization. Students are expected to be comfortable taking derivatives. Basic knowledge of probability theory (in particular, conditional proba- bility distributions and conditional expectations) is necessary. Understanding matrix vector operations and notation is helpful but not required. All coding exercises are performed in Python. Students are required to take a short pretest at the beginning of the course. The pretest score will not count toward the final grade but will help you understand whether your background in calculus, probability theory, as well as command of coding positions you for success in this course. Text: Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd ed. ISBN: 978-0-262-03924-6 Electronic copy of the book is available at the author’s webpage (under “Full Pdf”) http://incompleteideas.net/book/the-book-2nd.html Optional reading: Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016 ISBN: 978-0-262-03561-3 HTML version of the book is available at http://www.deeplearningbook.org Course Description: This course introduces Deep Reinforcement Learning (RL), one of the most modern techniques of ma- chine learning. Deep RL has attracted the attention of many researches and developers in recent years due to its wide range of applications in a variety of fields such as robotics, robotic surgery, pattern recognition, diagnosis based on medical image, treatment strategies in clinical decision making, person- alized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. Deep RL is often seen as the third area of machine learning, in addition to supervised and unsupervised algorithms, in which learning of an agent occurs as a result of its own actions and inter- action with the environment. Generally, such learning processes do not need to be guided externally, but it has been difficult until recently to use RL ideas practically. This course primarily focuses on problems that emerge in healthcare and life science applications. Tentative List of Topics: I. Reinforcement Learning (RL) ◦ Markov Decision Processes (MDP): Value Functions and Policies 1
  • 2. ◦ Dynamic Programming (DP): Bellman Equation ◦ Monte Carlo (MC) Methods ◦ Temporal-difference (TD) Prediction and Control: SARSA and Q-learning ◦ n-step TD ◦ Approximation Methods: Stochastic-gradient, Semi-gradient TD Update, Least-squares TD II. Deep Learning ◦ Neural Networks (NN): Classification & Regression ◦ Training NNs: Backpropagation ◦ Tuning NNs: Regularization ◦ Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) III. Deep RL ◦ Value-based Deep RL: Q-network ◦ Policy-based Deep RL: REINFORCE ◦ Asynchronous Methods for Deep RL: Advantage Actor-Critic (A2C) ◦ Model-based Deep RL Homework: Except when especially noted, homework assignments will be due each Sunday. The assignments will be posted on Canvas website and will consist of series of programming exercises (solutions should be implemented in Python) as well as analytical problems (knowledge of calculus and probability theory should suffice) that help students enhance their understanding of the underlying theory. Solutions to the programming exercises should be submitted via Canvas in a form of a single .ipynb (Jupyter Note- book) file. The solutions to the theoretical problems should be submitted in a form of a single PDF file. Note on the deadline and penalty: Solutions to the assignments submitted later than 1, 2, 3, 4, and 5 days after the due date will be penalized by 10%, 20%, 30%, 40%, and 100%, respectively. In case you need an extension, please coordinate with the instructor prior to the due day. Quizzes: An online quiz will be due before each class, unless announced otherwise. The quiz will consist of ap- proximately 5 basic questions on understanding of studied principals. No late quizzes will be allowed. Midterm Exam: The midterm exam will be released on March 10 (no lecture on March 10) and due March 17 at 7:40 pm (Eastern Time). The test will be similar to Homework exercises but cover topics studied up to this date. Late midterm will not be accepted. Final: The final examination will be due at 11:59 pm (Eastern Time) on May 12 (no lecture on May 12). The exam will be cumulative covering all topics studied. Late final will not be accepted. Attendance: Regular attendance (whether on campus or online) is expected but will not be taken. Recorded lectures will be available via the course website within 24 hours after the lecture. 2
  • 3. Participation: Although no credit is allocated for participation, everyone is encouraged to constructively participate in class by asking relevant questions. It is important that you check the e-mail registered with Canvas regularly and monitor course announcements and also participate in discussions on Piazza, the fo- rum available at https://piazza.com/class/kh5mr9vj75c2ah. All technical and data science related questions will be discussed on Piazza. Grading: The semester average is calculated using the formula: Grade = 0.25 · Homework + 0.20 · Quizzes + 0.25 · Midterm + 0.30 · Final Student Learning Objectives: ◦ proficiency in building optimal NNs using Python ◦ understanding of RL including MDP, Bellman equation, and optimal policy ◦ firm understanding of Deep RL and getting comfortable with approximation methods used in conjunction with RL ◦ hands-on experience on estimating the optimal policy and value functions Academic Integrity: You are responsible for understanding Harvard Extension School policies on academic integrity (www. extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, sub- mitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses. There are no excuses for failure to uphold academic integrity. To support your learning about academic citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism (www.extension. harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you’ll find links to the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of academic citation policy. The tutorials are anonymous open-learning tools. Disability Accommodations: The Extension School is committed to providing an accessible academic community. The Accessibil- ity Office offers a variety of accommodations and services to students with documented disabilities. More information can be found at www.extension.harvard.edu/resources-policies/resources/ accessibility-student-services Dates of Interest: ◦ Harvard Extension School classes begin, January 25, 2021 ◦ Pretest is due, January 29 ◦ Last day to change the credit status, January 31 ◦ Course drop deadline for full-tuition refund, January 31 ◦ Quiz 1 is due, February 3 ◦ Assignment 1 is due, February 7 ◦ Course drop deadline for half-tuition refund, February 7 ◦ Midterm Exam is due, March 17, 7:40 pm (Eastern Time) ◦ Withdrawal deadline, April 23 ◦ Final Exam is due, May 12, 11:59 pm (Eastern Time) 3