SlideShare a Scribd company logo
1 of 3
Download to read offline
CSCI S-89C Deep Reinforcement Learning
Syllabus Spring 2021
Lectures: Online web conference, Wednesdays, 7:40-9:40 pm
Lectures will be live-streamed with the video being available via the course website within 24 hours.
Instructor: Dr. Dmitry Kurochkin, Senior Research Analyst, Harvard University
E-mail: dkurochkin@fas.harvard.edu
Website: https://canvas.harvard.edu/courses/81664
Office Hours: By request
Teaching Fellows: TBA e-mail: TBA
Prerequisites:
Introductory probability and statistics, multivariate calculus equivalent to MATH E-21a, and profi-
ciency in Python programming equivalent to CSCI E-7.
Note on the prerequisites:
We will be formulating value (cost) functions and performing optimization. Students are expected to be
comfortable taking derivatives. Basic knowledge of probability theory (in particular, conditional proba-
bility distributions and conditional expectations) is necessary. Understanding matrix vector operations
and notation is helpful but not required. All coding exercises are performed in Python. Students are
required to take a short pretest at the beginning of the course. The pretest score will not count toward
the final grade but will help you understand whether your background in calculus, probability theory,
as well as command of coding positions you for success in this course.
Text:
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd ed.
ISBN: 978-0-262-03924-6
Electronic copy of the book is available at the author’s webpage (under “Full Pdf”)
http://incompleteideas.net/book/the-book-2nd.html
Optional reading:
Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016
ISBN: 978-0-262-03561-3
HTML version of the book is available at http://www.deeplearningbook.org
Course Description:
This course introduces Deep Reinforcement Learning (RL), one of the most modern techniques of ma-
chine learning. Deep RL has attracted the attention of many researches and developers in recent years
due to its wide range of applications in a variety of fields such as robotics, robotic surgery, pattern
recognition, diagnosis based on medical image, treatment strategies in clinical decision making, person-
alized medical treatment, drug discovery, speech recognition, computer vision, and natural language
processing. Deep RL is often seen as the third area of machine learning, in addition to supervised and
unsupervised algorithms, in which learning of an agent occurs as a result of its own actions and inter-
action with the environment. Generally, such learning processes do not need to be guided externally,
but it has been difficult until recently to use RL ideas practically. This course primarily focuses on
problems that emerge in healthcare and life science applications.
Tentative List of Topics:
I. Reinforcement Learning (RL)
◦ Markov Decision Processes (MDP): Value Functions and Policies
1
◦ Dynamic Programming (DP): Bellman Equation
◦ Monte Carlo (MC) Methods
◦ Temporal-difference (TD) Prediction and Control: SARSA and Q-learning
◦ n-step TD
◦ Approximation Methods: Stochastic-gradient, Semi-gradient TD Update, Least-squares TD
II. Deep Learning
◦ Neural Networks (NN): Classification & Regression
◦ Training NNs: Backpropagation
◦ Tuning NNs: Regularization
◦ Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)
III. Deep RL
◦ Value-based Deep RL: Q-network
◦ Policy-based Deep RL: REINFORCE
◦ Asynchronous Methods for Deep RL: Advantage Actor-Critic (A2C)
◦ Model-based Deep RL
Homework:
Except when especially noted, homework assignments will be due each Sunday. The assignments will
be posted on Canvas website and will consist of series of programming exercises (solutions should be
implemented in Python) as well as analytical problems (knowledge of calculus and probability theory
should suffice) that help students enhance their understanding of the underlying theory. Solutions to
the programming exercises should be submitted via Canvas in a form of a single .ipynb (Jupyter Note-
book) file. The solutions to the theoretical problems should be submitted in a form of a single PDF
file.
Note on the deadline and penalty:
Solutions to the assignments submitted later than 1, 2, 3, 4, and 5 days after the due date will be
penalized by 10%, 20%, 30%, 40%, and 100%, respectively. In case you need an extension, please
coordinate with the instructor prior to the due day.
Quizzes:
An online quiz will be due before each class, unless announced otherwise. The quiz will consist of ap-
proximately 5 basic questions on understanding of studied principals. No late quizzes will be allowed.
Midterm Exam:
The midterm exam will be released on March 10 (no lecture on March 10) and due March 17 at 7:40
pm (Eastern Time). The test will be similar to Homework exercises but cover topics studied up to this
date. Late midterm will not be accepted.
Final:
The final examination will be due at 11:59 pm (Eastern Time) on May 12 (no lecture on May 12). The
exam will be cumulative covering all topics studied. Late final will not be accepted.
Attendance:
Regular attendance (whether on campus or online) is expected but will not be taken. Recorded lectures
will be available via the course website within 24 hours after the lecture.
2
Participation:
Although no credit is allocated for participation, everyone is encouraged to constructively participate
in class by asking relevant questions. It is important that you check the e-mail registered with Canvas
regularly and monitor course announcements and also participate in discussions on Piazza, the fo-
rum available at https://piazza.com/class/kh5mr9vj75c2ah. All technical and data science related
questions will be discussed on Piazza.
Grading:
The semester average is calculated using the formula:
Grade = 0.25 · Homework + 0.20 · Quizzes + 0.25 · Midterm + 0.30 · Final
Student Learning Objectives:
◦ proficiency in building optimal NNs using Python
◦ understanding of RL including MDP, Bellman equation, and optimal policy
◦ firm understanding of Deep RL and getting comfortable with approximation methods used in
conjunction with RL
◦ hands-on experience on estimating the optimal policy and value functions
Academic Integrity:
You are responsible for understanding Harvard Extension School policies on academic integrity (www.
extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to
use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, sub-
mitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses.
There are no excuses for failure to uphold academic integrity. To support your learning about academic
citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism (www.extension.
harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you’ll find links to
the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of
academic citation policy. The tutorials are anonymous open-learning tools.
Disability Accommodations:
The Extension School is committed to providing an accessible academic community. The Accessibil-
ity Office offers a variety of accommodations and services to students with documented disabilities.
More information can be found at www.extension.harvard.edu/resources-policies/resources/
accessibility-student-services
Dates of Interest:
◦ Harvard Extension School classes begin, January 25, 2021
◦ Pretest is due, January 29
◦ Last day to change the credit status, January 31
◦ Course drop deadline for full-tuition refund, January 31
◦ Quiz 1 is due, February 3
◦ Assignment 1 is due, February 7
◦ Course drop deadline for half-tuition refund, February 7
◦ Midterm Exam is due, March 17, 7:40 pm (Eastern Time)
◦ Withdrawal deadline, April 23
◦ Final Exam is due, May 12, 11:59 pm (Eastern Time)
3

More Related Content

What's hot

Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012
Kathy Favazza
 
Ls 12orientation
Ls 12orientationLs 12orientation
Ls 12orientation
Jim Walker
 
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Traian Rebedea
 

What's hot (20)

Speaker 11 jim o'dwyer
Speaker 11 jim o'dwyerSpeaker 11 jim o'dwyer
Speaker 11 jim o'dwyer
 
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course   Does It Really...Strijker, A. (2005 12 06). Piloting Sakai In A Master Course   Does It Really...
Strijker, A. (2005 12 06). Piloting Sakai In A Master Course Does It Really...
 
Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012Edu614 Session 1 Summer 2012
Edu614 Session 1 Summer 2012
 
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Bridge TEFL IDELT Official Transcript of Deepak (Danny) SinghBridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
Bridge TEFL IDELT Official Transcript of Deepak (Danny) Singh
 
Melbourne t1 2016-assignment_2_mn504
Melbourne   t1 2016-assignment_2_mn504Melbourne   t1 2016-assignment_2_mn504
Melbourne t1 2016-assignment_2_mn504
 
Involving students in Semester of Code: experiences and issues from the first...
Involving students in Semester of Code: experiences and issues from the first...Involving students in Semester of Code: experiences and issues from the first...
Involving students in Semester of Code: experiences and issues from the first...
 
Course-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course AuthoringCourse-Adaptive Content Recommender for Course Authoring
Course-Adaptive Content Recommender for Course Authoring
 
Open Education: the MOOC Experience
Open Education: the MOOC ExperienceOpen Education: the MOOC Experience
Open Education: the MOOC Experience
 
Transformation of the Ol Instructor
Transformation of the Ol InstructorTransformation of the Ol Instructor
Transformation of the Ol Instructor
 
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
TLC2016 - Peer Review, Peer Assessment, and Peer Feedback methods based on Bl...
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
 
Teaching FEM software in formal and non-formal environment with MOOCs
Teaching FEM software in formal and non-formal environment with MOOCsTeaching FEM software in formal and non-formal environment with MOOCs
Teaching FEM software in formal and non-formal environment with MOOCs
 
MCO 436 syllabus
MCO 436 syllabus MCO 436 syllabus
MCO 436 syllabus
 
Csci e46-syllabus-spring19-v1-2
Csci e46-syllabus-spring19-v1-2Csci e46-syllabus-spring19-v1-2
Csci e46-syllabus-spring19-v1-2
 
Ls 12orientation
Ls 12orientationLs 12orientation
Ls 12orientation
 
Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS
 
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
Overview and Preliminary Results of Using PolyCAFe for Collaboration Analysis...
 
WP2 Course Modernisation
WP2 Course ModernisationWP2 Course Modernisation
WP2 Course Modernisation
 
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
«Innovations in pedagogy using MOOCs» /  Ting-Chuen Pong, professor of Comput...«Innovations in pedagogy using MOOCs» /  Ting-Chuen Pong, professor of Comput...
«Innovations in pedagogy using MOOCs» / Ting-Chuen Pong, professor of Comput...
 
Web-based Virtual Reality development in classroom: From learner's perspectives
Web-based Virtual Reality development in classroom: From learner's perspectivesWeb-based Virtual Reality development in classroom: From learner's perspectives
Web-based Virtual Reality development in classroom: From learner's perspectives
 

Similar to Deep reinforcement learning

ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6
Shaun Kellogg
 
Assignments .30%
Assignments .30%Assignments .30%
Assignments .30%
butest
 
OutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docxOutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docx
gerardkortney
 
ISSC362Course SummaryCourse ISSC362 Title IT Securit
ISSC362Course SummaryCourse  ISSC362 Title  IT SecuritISSC362Course SummaryCourse  ISSC362 Title  IT Securit
ISSC362Course SummaryCourse ISSC362 Title IT Securit
TatianaMajor22
 
MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]
Maurice Dawson
 
1 Saint Leo University GBA 334 Applied Decision.docx
 1 Saint Leo University  GBA 334  Applied Decision.docx 1 Saint Leo University  GBA 334  Applied Decision.docx
1 Saint Leo University GBA 334 Applied Decision.docx
aryan532920
 

Similar to Deep reinforcement learning (20)

ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6ECI519_Syllabus_Spring_2016-6
ECI519_Syllabus_Spring_2016-6
 
Assignments .30%
Assignments .30%Assignments .30%
Assignments .30%
 
OutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docxOutlineWhat will your programinitiativecourse do What are .docx
OutlineWhat will your programinitiativecourse do What are .docx
 
Robotics Syllabus 2016 2017
Robotics Syllabus 2016 2017Robotics Syllabus 2016 2017
Robotics Syllabus 2016 2017
 
Hybrid Statistics Course Development
Hybrid Statistics Course DevelopmentHybrid Statistics Course Development
Hybrid Statistics Course Development
 
Syllabus
SyllabusSyllabus
Syllabus
 
8th sem (1)
8th sem (1)8th sem (1)
8th sem (1)
 
2009-06-15 Marist Summer Series
2009-06-15 Marist Summer Series2009-06-15 Marist Summer Series
2009-06-15 Marist Summer Series
 
Scripting for Design
Scripting for DesignScripting for Design
Scripting for Design
 
Cwmd 2601 2020
Cwmd 2601 2020Cwmd 2601 2020
Cwmd 2601 2020
 
Discrete-Mathematics syllabus sample.docx
Discrete-Mathematics syllabus sample.docxDiscrete-Mathematics syllabus sample.docx
Discrete-Mathematics syllabus sample.docx
 
Introduction to EMA highlights
Introduction to EMA highlightsIntroduction to EMA highlights
Introduction to EMA highlights
 
ISSC362Course SummaryCourse ISSC362 Title IT Securit
ISSC362Course SummaryCourse  ISSC362 Title  IT SecuritISSC362Course SummaryCourse  ISSC362 Title  IT Securit
ISSC362Course SummaryCourse ISSC362 Title IT Securit
 
Chm1083dfghj
Chm1083dfghjChm1083dfghj
Chm1083dfghj
 
Res1 Methods of Research Outline
Res1 Methods of Research OutlineRes1 Methods of Research Outline
Res1 Methods of Research Outline
 
MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]MIS213 Syllabus [Draft]
MIS213 Syllabus [Draft]
 
Computational thinking
Computational thinkingComputational thinking
Computational thinking
 
Ngs Hsm 700bl Module 1 01272009
Ngs Hsm 700bl Module 1 01272009Ngs Hsm 700bl Module 1 01272009
Ngs Hsm 700bl Module 1 01272009
 
1 Saint Leo University GBA 334 Applied Decision.docx
 1 Saint Leo University  GBA 334  Applied Decision.docx 1 Saint Leo University  GBA 334  Applied Decision.docx
1 Saint Leo University GBA 334 Applied Decision.docx
 
Stem 2 syllabus
Stem 2 syllabusStem 2 syllabus
Stem 2 syllabus
 

Recently uploaded

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 

Recently uploaded (20)

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 

Deep reinforcement learning

  • 1. CSCI S-89C Deep Reinforcement Learning Syllabus Spring 2021 Lectures: Online web conference, Wednesdays, 7:40-9:40 pm Lectures will be live-streamed with the video being available via the course website within 24 hours. Instructor: Dr. Dmitry Kurochkin, Senior Research Analyst, Harvard University E-mail: dkurochkin@fas.harvard.edu Website: https://canvas.harvard.edu/courses/81664 Office Hours: By request Teaching Fellows: TBA e-mail: TBA Prerequisites: Introductory probability and statistics, multivariate calculus equivalent to MATH E-21a, and profi- ciency in Python programming equivalent to CSCI E-7. Note on the prerequisites: We will be formulating value (cost) functions and performing optimization. Students are expected to be comfortable taking derivatives. Basic knowledge of probability theory (in particular, conditional proba- bility distributions and conditional expectations) is necessary. Understanding matrix vector operations and notation is helpful but not required. All coding exercises are performed in Python. Students are required to take a short pretest at the beginning of the course. The pretest score will not count toward the final grade but will help you understand whether your background in calculus, probability theory, as well as command of coding positions you for success in this course. Text: Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction, 2nd ed. ISBN: 978-0-262-03924-6 Electronic copy of the book is available at the author’s webpage (under “Full Pdf”) http://incompleteideas.net/book/the-book-2nd.html Optional reading: Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016 ISBN: 978-0-262-03561-3 HTML version of the book is available at http://www.deeplearningbook.org Course Description: This course introduces Deep Reinforcement Learning (RL), one of the most modern techniques of ma- chine learning. Deep RL has attracted the attention of many researches and developers in recent years due to its wide range of applications in a variety of fields such as robotics, robotic surgery, pattern recognition, diagnosis based on medical image, treatment strategies in clinical decision making, person- alized medical treatment, drug discovery, speech recognition, computer vision, and natural language processing. Deep RL is often seen as the third area of machine learning, in addition to supervised and unsupervised algorithms, in which learning of an agent occurs as a result of its own actions and inter- action with the environment. Generally, such learning processes do not need to be guided externally, but it has been difficult until recently to use RL ideas practically. This course primarily focuses on problems that emerge in healthcare and life science applications. Tentative List of Topics: I. Reinforcement Learning (RL) ◦ Markov Decision Processes (MDP): Value Functions and Policies 1
  • 2. ◦ Dynamic Programming (DP): Bellman Equation ◦ Monte Carlo (MC) Methods ◦ Temporal-difference (TD) Prediction and Control: SARSA and Q-learning ◦ n-step TD ◦ Approximation Methods: Stochastic-gradient, Semi-gradient TD Update, Least-squares TD II. Deep Learning ◦ Neural Networks (NN): Classification & Regression ◦ Training NNs: Backpropagation ◦ Tuning NNs: Regularization ◦ Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) III. Deep RL ◦ Value-based Deep RL: Q-network ◦ Policy-based Deep RL: REINFORCE ◦ Asynchronous Methods for Deep RL: Advantage Actor-Critic (A2C) ◦ Model-based Deep RL Homework: Except when especially noted, homework assignments will be due each Sunday. The assignments will be posted on Canvas website and will consist of series of programming exercises (solutions should be implemented in Python) as well as analytical problems (knowledge of calculus and probability theory should suffice) that help students enhance their understanding of the underlying theory. Solutions to the programming exercises should be submitted via Canvas in a form of a single .ipynb (Jupyter Note- book) file. The solutions to the theoretical problems should be submitted in a form of a single PDF file. Note on the deadline and penalty: Solutions to the assignments submitted later than 1, 2, 3, 4, and 5 days after the due date will be penalized by 10%, 20%, 30%, 40%, and 100%, respectively. In case you need an extension, please coordinate with the instructor prior to the due day. Quizzes: An online quiz will be due before each class, unless announced otherwise. The quiz will consist of ap- proximately 5 basic questions on understanding of studied principals. No late quizzes will be allowed. Midterm Exam: The midterm exam will be released on March 10 (no lecture on March 10) and due March 17 at 7:40 pm (Eastern Time). The test will be similar to Homework exercises but cover topics studied up to this date. Late midterm will not be accepted. Final: The final examination will be due at 11:59 pm (Eastern Time) on May 12 (no lecture on May 12). The exam will be cumulative covering all topics studied. Late final will not be accepted. Attendance: Regular attendance (whether on campus or online) is expected but will not be taken. Recorded lectures will be available via the course website within 24 hours after the lecture. 2
  • 3. Participation: Although no credit is allocated for participation, everyone is encouraged to constructively participate in class by asking relevant questions. It is important that you check the e-mail registered with Canvas regularly and monitor course announcements and also participate in discussions on Piazza, the fo- rum available at https://piazza.com/class/kh5mr9vj75c2ah. All technical and data science related questions will be discussed on Piazza. Grading: The semester average is calculated using the formula: Grade = 0.25 · Homework + 0.20 · Quizzes + 0.25 · Midterm + 0.30 · Final Student Learning Objectives: ◦ proficiency in building optimal NNs using Python ◦ understanding of RL including MDP, Bellman equation, and optimal policy ◦ firm understanding of Deep RL and getting comfortable with approximation methods used in conjunction with RL ◦ hands-on experience on estimating the optimal policy and value functions Academic Integrity: You are responsible for understanding Harvard Extension School policies on academic integrity (www. extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time, sub- mitting the wrong draft, or being overwhelmed with multiple demands are not acceptable excuses. There are no excuses for failure to uphold academic integrity. To support your learning about academic citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism (www.extension. harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you’ll find links to the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your knowledge of academic citation policy. The tutorials are anonymous open-learning tools. Disability Accommodations: The Extension School is committed to providing an accessible academic community. The Accessibil- ity Office offers a variety of accommodations and services to students with documented disabilities. More information can be found at www.extension.harvard.edu/resources-policies/resources/ accessibility-student-services Dates of Interest: ◦ Harvard Extension School classes begin, January 25, 2021 ◦ Pretest is due, January 29 ◦ Last day to change the credit status, January 31 ◦ Course drop deadline for full-tuition refund, January 31 ◦ Quiz 1 is due, February 3 ◦ Assignment 1 is due, February 7 ◦ Course drop deadline for half-tuition refund, February 7 ◦ Midterm Exam is due, March 17, 7:40 pm (Eastern Time) ◦ Withdrawal deadline, April 23 ◦ Final Exam is due, May 12, 11:59 pm (Eastern Time) 3