MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Watch video: https://youtu.be/zR11FLZ-O9M
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
In some applications, the output of the system is a sequence of actions. In such a case, a single action is not important
game playing where a single move by itself is not that important.in the case of the agent acts on its environment, it receives some evaluation of its action (reinforcement),
but is not told of which action is the correct one to achieve its goal
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
In some applications, the output of the system is a sequence of actions. In such a case, a single action is not important
game playing where a single move by itself is not that important.in the case of the agent acts on its environment, it receives some evaluation of its action (reinforcement),
but is not told of which action is the correct one to achieve its goal
In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
I reviewed the following papers.
- T. Haarnoja, et al., “Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., “Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
This slide reviews deep reinforcement learning, specially Q-Learning and its variants. We introduce Bellman operator and approximate it with deep neural network. Last but not least, we review the classical paper: DeepMind Atari Game beats human performance. Also, some tips of stabilizing DQN are included.
Intro to Reinforcement learning - part IIIMikko Mäkipää
Introduction to Reinforcement Learning, part III: Basic approximate methods
This is the final presentation in a three-part series covering the basics of Reinforcement Learning (RL).
In this presentation, we introduce value function approximation and cover three different approaches to generating features for linear models.
We then take a sidestep to cover stochastic gradient descent in some detail before we return to introduce semi-gradient descent for RL. We also briefly cover a batch method as an alternative for episodic methods.
We discuss the implementation of the RL algorithms. For further discussion and illustrating the simulation results, we refer to Github repositories with source code of the implementation as well as Jupyter notebooks visualizing the simulation results.
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanPeerasak C.
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
Watch video: https://youtu.be/O5xeyoRL95U
An introductory lecture for MIT course 6.S094 on the basics of deep learning including a few key ideas, subfields, and the big picture of why neural networks have inspired and energized an entire new generation of researchers. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
Chris Soderquist presentation at the 2016 Science of HOPE
Description:
This session will introduce participants to a powerful approach to orchestrating useful learning across difficult boundaries using system dynamics. Through real world examples and interactive exercises, participants will learn how system dynamics can help them gain far more useful leverage when addressing complex, adaptive challenges. Participants will also see how this approach was used in a project funded by the Foundation for Healthy Generations to guide strategic decisions in Washington (and other states) for building community capacity and resilience.
In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
I reviewed the following papers.
- T. Haarnoja, et al., “Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., “Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
This slide reviews deep reinforcement learning, specially Q-Learning and its variants. We introduce Bellman operator and approximate it with deep neural network. Last but not least, we review the classical paper: DeepMind Atari Game beats human performance. Also, some tips of stabilizing DQN are included.
Intro to Reinforcement learning - part IIIMikko Mäkipää
Introduction to Reinforcement Learning, part III: Basic approximate methods
This is the final presentation in a three-part series covering the basics of Reinforcement Learning (RL).
In this presentation, we introduce value function approximation and cover three different approaches to generating features for linear models.
We then take a sidestep to cover stochastic gradient descent in some detail before we return to introduce semi-gradient descent for RL. We also briefly cover a batch method as an alternative for episodic methods.
We discuss the implementation of the RL algorithms. For further discussion and illustrating the simulation results, we refer to Github repositories with source code of the implementation as well as Jupyter notebooks visualizing the simulation results.
MIT Deep Learning Basics: Introduction and Overview by Lex FridmanPeerasak C.
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
Watch video: https://youtu.be/O5xeyoRL95U
An introductory lecture for MIT course 6.S094 on the basics of deep learning including a few key ideas, subfields, and the big picture of why neural networks have inspired and energized an entire new generation of researchers. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
Chris Soderquist presentation at the 2016 Science of HOPE
Description:
This session will introduce participants to a powerful approach to orchestrating useful learning across difficult boundaries using system dynamics. Through real world examples and interactive exercises, participants will learn how system dynamics can help them gain far more useful leverage when addressing complex, adaptive challenges. Participants will also see how this approach was used in a project funded by the Foundation for Healthy Generations to guide strategic decisions in Washington (and other states) for building community capacity and resilience.
D u e o n A p r i l 1 3 , 2 0 1 7 Accounting 212 F.docxaryan532920
D u e o n A p r i l 1 3 , 2 0 1 7
Accounting 212
Financial Statement Analysis Assignment
Use MS Word (double spacing) & Excel for all Answers
Part I – General Information Contents
How to Read Annual Reports:
http://www.ibm.com/investor/help/reports/index.html?tab=tab_BB
Read the information provided above and answer the following questions:
1. What are the optional and required (by SEC, Securities & Exchange Committee)
elements required in an annual report?
2. Describe the contents of the Management Discussion and Auditor's Report.
Download PepsiCo 2015 Annual Report from Sakai and answer following questions:
A. Management’s Discussion and Analysis (MD&A)
Identify two primary drivers/issues that explain current and future results of
operations discussed in the MD&A. For example, the gross profit percentage increased
because of improved buyer/supplier relationship resulting in greater overall operating
performance.
B. Independent Auditor’s Report
a. Who was the company’s auditor and where is it located?
b. What is responsibility of the auditor?
c. Who is responsible for the preparation of and information within the company’s
financial statement?
d. The audit was conducted in accordance with what?
e. What was the opinion of the auditor?
Part II – Financial Data
** Use Excel or Table function in MS Word to create a table for every data that you used
in your analyses and the page number where they can be located. For example:
Data Page Number
Net Revenue 63
Accounts Receivable 98
You may use Table function in Word or Excel for data analysis.
C. Trend Analysis:
a. Make a 3-year trend analysis, using 2013 as the base year, of [1] net sales and [2]
net income.
b. Comment on the significance of the trend analysis results.
D u e o n A p r i l 1 3 , 2 0 1 7
D. Ratios Analysis (Include Formulas):
a. Compute for 2015 and 2014 the [1] profit margin, [2] asset turnover, [3] return on
assets, and [4] return on common stockholders’ equity. Comment on PepsiCo’s
profitability.
b. Compute for 2015 and 2014 the [1] debt to total assets and [2] times interests
earned ratio. How would you evaluate PepsiCo’s long-term solvency?
E. Other Resources:
a. Use the internet to search for THREE other resources that may assist you in
analyzing PepsiCo’s financial statements.
Submit a hard copy (printout) of the homepage (1st page only), including its URL
(only the homepage). State the reasons that they were selected.
b. For each of the sources, state how you would use the information obtained to
analyze the PepsiCo’s financial statements.
c. State whether the additional information confirms or contradicts your previous
assessments.
F. Based on the information collected and analyses, state your reasons why a potential
should / should not invest in PepsiCo. Be specific.
Group Process
Group Influence on Individual Behavior
What is a group?
...
SPAGAT ZWISCHEN BIAS UND FAIRNESS
KI soll fair sein. Da sind wir uns alle einig. Entsprechend gilt es, eine „Voreingenommenheit“ der eigenen KI-Lösung zu vermeiden.
Leichter gesagt als getan, denn Bias kann sich an verschiedenen Stellen innerhalb des AI/ML-Lifecycles einschleichen – vom initialen Design bis hin zum produktiven Einsatz des Modells. Diese Stellen gilt es zu identifizieren und im Detail zu verstehen. Denn nicht jede Art von Voreingenommenheit ist automatisch auch böse bzw. unfair.
Die Session zeigt, wie potenzielles Auftreten von unerwünschtem Bias in der eigenen KI-Lösung aufgedeckt und vermieden werden kann.
THINKING ABOUT THINKING
Audience: PM & BA
Level: All
Date: May 26
Time: 11:30 AM - 12:30 PM
Description
Thinking is a big part of a Project Manager’s and Business Analyst's job. But how often have you spent time thinking about thinking? This presentation looks at thinking as a critical soft skill for project managers and how a disciplined approach to thinking improves you effectiveness as a change agent for the company in the role of project manager. The presentation will discuss the Thinking Hats, Five Types of Thinking, and brush into the entire world of Business Analytics. The presentation focuses on how the skills of Strategic Analysis, Tactical Analysis, Predictive Analysis, Data mining work together for the complete business management cycle. To add to the thinking equation, the session will explore the power of Social Media sentiment and how the way people "feel" about things is an important factor in the business equation. Think about it !!!!
1. Participants will understand the relationship between planning, analysis, problem solving, decision making and thinking.
2. Students will be able to explain an "Adapting to Whats Happening Model" that includes Data Recording, Strategic Analysis, Tactical Analysis, Predictive Analysis, and Social Media Sentiment. And how it impacts the business.
3. Students will explore various factors of human bias and how that impacts thinking. The student will understand that bias cannot not be completely eliminated, but should be embraced as a human factor in any thinking exercise. The student will understand that personal perspective/bias is a factor, but not THE factor in thinking.
Human connectedness and meaningful conversations - how coaching boosts the su...Greatness Coaching
8 traits of a coach/leader enabling inclusion, diversity, agility and collaboration:
1. Authentic and humble
2. Holistic listener
3. Learner of the leader’s Greatness
4. Non-judgmental thinking-partner
5. Comfortable with not knowing, with failure, trusting process
6. Empathetic, yet detached from outcome
7. Courageous feedback-provider
8. Supportive challenger
YTH Live 2019 (youth+tech+health): Online to Offline Impact MeasurementWhole Whale
Whether your organization is one month or 100 years old, the common denominator all nonprofits share is a high return on the investment of democratizing data. But a data democracy doesn’t mean “data all the things.” There are a daunting number of frameworks and data sources to define how your organization impacts the world. Impact matters to the volunteers, donors, and foundations that invest time and money into your cause, and to the staff that pours their energy into making the organization’s (digital) presence run. This workshop presentation will share the journey other nonprofits like PowerPoetry.org, CrisisTextLine.org, have taken to reach outsized impact for their size by using data. Workshop participants will learn strategies to measure their organization’s impact, and be invited to share their wins with democratizing data internally as well as posing challenges in their organization. (Presented at YTH Live 2019)
3/4/20, 6)48 PMAPUS RCampus - Week 1 - Financial Statements: RCampus Open Tools for Open Minds
Page 1 of 2https://irubric.apus.edu/gradeeditc.cfm?xsite=MainSite
W e e k 1 - F i n a n c i a l S t a t e m e n t s
Grade:
Possible Points: 100.00
Calculated Score:
Adjusted Score:
Percentage: -
! Descriptions ! Comments
Financial Statements
100 %
Innovative
4 pts Exceeds
Dynamic
3.2 pts Meets
Shines
2.8 pts Below
Starting
2.4 pts Well Below
Haven’t started
0 pts Not Aligned
Balance Sheet
25 %
Created balance
sheet
Innovative
Created current
balance sheet for one
calendar year;
assets, liabilities, and
equity present; all
calculations correct;
balance sheet was
balanced.
Dynamic
Created current
balance sheet for less
than one calendar
year; assets,
liabilities, and equity
present; all
calculations correct;
balance sheet was
balanced.
Shines
Created current
balance sheet for one
calendar year;
assets, liabilities, and
equity present; all
calculations not
correct; balance
sheet was not
balanced.
Starting
Attempted to create
current balance
sheet; assets,
liabilities, and/or
equity present but
not all three; all
calculations not
correct; balance
sheet was not
balanced.
Haven’t started
Did not complete.
Income Statement
30 %
Created income
statement
Innovative
Created current
income statement for
one calendar year;
had the following
plus additional items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations correct.
Dynamic
Created current
income statement for
less than one
calendar year; had
the following plus
additional items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations correct.
Shines
Created current
income statement for
one calendar year;
minimally had the
following items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations may not
be correct.
Starting
Attempted to create
current income
statement; did not
minimally have the
following items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations not
correct.
Haven’t started
Did not complete.
Cash Flow
Statement
25 %
Created cash flow
statement
Innovative
Created current cash
flow statement for
one calendar year;
had specific cash
inflow and cash
outflow sections; all
calculations correct.
Dynamic
Created current cash
flow statement for
less than one
calendar year; had
specific cash inflow
and cash outflow
sections; all
calculations correct.
Shines
Created current cash
flow statement for
one calendar year;
had specific cash
inflow and/or cash
outflow sections;
some calculations not
correct.
Starting
Attempted to create
current cash flow
statement; did not
have specific cash
inflow and cash
outflow sections; all
calculations not
correct.
Haven’t started
Did not complete.
Spreadsheet
Software
10 %
Used spreadsheet
software
Innovative
Used spreadsheet
software to crea.
3/4/20, 6)48 PMAPUS RCampus - Week 1 - Financial Statements: RCampus Open Tools for Open Minds
Page 1 of 2https://irubric.apus.edu/gradeeditc.cfm?xsite=MainSite
W e e k 1 - F i n a n c i a l S t a t e m e n t s
Grade:
Possible Points: 100.00
Calculated Score:
Adjusted Score:
Percentage: -
! Descriptions ! Comments
Financial Statements
100 %
Innovative
4 pts Exceeds
Dynamic
3.2 pts Meets
Shines
2.8 pts Below
Starting
2.4 pts Well Below
Haven’t started
0 pts Not Aligned
Balance Sheet
25 %
Created balance
sheet
Innovative
Created current
balance sheet for one
calendar year;
assets, liabilities, and
equity present; all
calculations correct;
balance sheet was
balanced.
Dynamic
Created current
balance sheet for less
than one calendar
year; assets,
liabilities, and equity
present; all
calculations correct;
balance sheet was
balanced.
Shines
Created current
balance sheet for one
calendar year;
assets, liabilities, and
equity present; all
calculations not
correct; balance
sheet was not
balanced.
Starting
Attempted to create
current balance
sheet; assets,
liabilities, and/or
equity present but
not all three; all
calculations not
correct; balance
sheet was not
balanced.
Haven’t started
Did not complete.
Income Statement
30 %
Created income
statement
Innovative
Created current
income statement for
one calendar year;
had the following
plus additional items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations correct.
Dynamic
Created current
income statement for
less than one
calendar year; had
the following plus
additional items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations correct.
Shines
Created current
income statement for
one calendar year;
minimally had the
following items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations may not
be correct.
Starting
Attempted to create
current income
statement; did not
minimally have the
following items:
sales, cost of goods
sold, expenses,
losses, gross income,
taxes, net income;
calculations not
correct.
Haven’t started
Did not complete.
Cash Flow
Statement
25 %
Created cash flow
statement
Innovative
Created current cash
flow statement for
one calendar year;
had specific cash
inflow and cash
outflow sections; all
calculations correct.
Dynamic
Created current cash
flow statement for
less than one
calendar year; had
specific cash inflow
and cash outflow
sections; all
calculations correct.
Shines
Created current cash
flow statement for
one calendar year;
had specific cash
inflow and/or cash
outflow sections;
some calculations not
correct.
Starting
Attempted to create
current cash flow
statement; did not
have specific cash
inflow and cash
outflow sections; all
calculations not
correct.
Haven’t started
Did not complete.
Spreadsheet
Software
10 %
Used spreadsheet
software
Innovative
Used spreadsheet
software to crea ...
Tips for the Food sector: To keep up with this constantly shifting consumer behavior, look for early signs by using Google Trends to see how demand for certain food products or delivery services is changing to meet people’s needs.
Tips for Travel marketers: Our APAC travel recovery itinerary revealed that people have local trips and safety in mind, so marketers should seek to provide safety information upfront and present local product offerings and fun activities.
Tips for keeping people entertained: Though some people who signed up for a new entertainment source might stay, there’s also a higher likelihood of churn when their trial period ends. If you saw an increase in people signing up for your online products and services, focus on retention to keep them coming back, especially if you offered a free trial during the pandemic.
Tips for merchants: Make sure you integrate digital payment options for your consumers. Digital payments are expected to see a continued boost post-COVID-19, and trust in e-Wallets will likely increase.
Although there is still some instability, the internet sector in SEA is set to emerge stronger than ever in a post-COVID-19 world. The digital economy remains a bright spot in a very challenging economic environment, and e-Commerce remains a key driver of growth. The biggest takeaway for brands and marketers is the need to focus on people and their changing habits online, as well as keeping up with changing trends, as we continue to understand what our new normal will look like in the future.
A Roadmap for CrossBorder Data Flows: Future-Proofing Readiness and Cooperati...Peerasak C.
The World Economic Forum partnered with the Bahrain Economic Development Board and a Steering Committee-led project community of organizations from around the world to co-design the Roadmap for Cross-Border Data Flows, with the aim of identifying best-practice policies that both promote innovation in data-intensive technologies and enable data collaboration at the regional and international levels.
Creating effective policy on cross-border data flows is a priority for any nation that critically depends on its interactions with the rest of the world through the free flow of capital, goods, knowledge and people. Now more than ever, cross-border data flows are key predicates for countries and regions that wish to compete in the Fourth Industrial Revolution and thrive in the post COVID-19 era.
Despite this reality, we are witnessing a proliferation of policies around the world that restrict the movement of data across borders, which is posing a serious threat to the global digital economy, and to the ability of nations to maximize the economic and social benefits of data-reliant technologies such as artificial intelligence (AI) and blockchain.
We hope that countries wishing to engage in cross-border data sharing can feel confident in using the Roadmap as a guide for designing robust respective domestic policies that retain a fine balance between the benefits and risks of data flows.
“Freelancing in America” (FIA) is the most comprehensive study of the independent workforce. Commissioned by Upwork and
Freelancers Union, this study analyzes the size and impact of the freelance economy, as well as the motivations and challenges of this way of working. This year 53 percent of Gen Z workers freelanced—the highest independent workforce participation of any age bracket since FIA’s launch in 2014.
How to start a business: Checklist and CanvasPeerasak C.
How to start a business
A 15-point checklist and notes to take you from idea to launch
It’s critical to understand and manage your startup costs and cash flow wisely. If you aren’t self-funded, find out which investment options make the most sense for your business.
Outsourcing or hiring employees who are experts in their field will free up your time to focus on what you do best so you can drive faster growth. You can also lean on business partners in your community for support and to collectively grow your customer base.
Always remember, fortune favors the bold. But, it also smiles upon those who are prepared.
Download the business model canvas and full checklist here:
https://quickbooks.intuit.com/cas/dam/DOCUMENT/A5AuvH7EZ/Checklist-and-canvas.pdf
The Multiple Effects of Business Planning on New Venture PerformancePeerasak C.
ABSTRACT
We investigate the multiple effects of writing a business plan prior to start-up on new venture performance. We argue that the impact of business plans depends on the purpose for and circumstances in which they are being used. We offer an empirical methodology which can account for these multiple effects while disentangling real impact effects from selection
effects. We apply this to English data where we find that business plans promote employment growth. This is found to be due to the impact of the plan and not selection effects.
- Source: https://www.effectuation.org/wp-content/uploads/2017/06/The-Multiple-Effects-of-Business-Planning-onNew-Venture-Performance-1.pdf
Artificial Intelligence and Life in 2030. Standford U. Sep.2016Peerasak C.
Executive Summary
Artificial Intelligence (AI) is a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason, and take action. While the rate of progress in AI has been patchy and unpredictable, there have been significant advances since the field's inception sixty years ago. Once a mostly academic area of study, twenty-first century AI enables a constellation of mainstream technologies that are having a substantial impact on everyday lives. Computer vision and AI planning, for example, drive the video games that are now a bigger entertainment industry than Hollywood. Deep learning, a form of machine learning based on layered representations of variables referred to as neural networks, has made speech-understanding practical on our phones and in our kitchens, and its algorithms can be applied widely to an array of applications that rely on pattern recognition. Natural Language Processing (NLP) and knowledge representation and reasoning have enabled a machine to beat the Jeopardy champion and are bringing new power to Web searches.
- Source: Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Sarit Kraus, Kevin Leyton-Brown, David Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, and Astro Teller. "Artificial Intelligence and Life in 2030." One Hundred Year Study on Artificial Intelligence: Report of the 2015-2016 Study Panel, Stanford University, Stanford, CA, September 2016. Doc: http://ai100.stanford.edu/2016-report. Accessed: September 6, 2016.
Testing Business Ideas by David Bland & Alex Osterwalder Peerasak C.
"This new Strategyzer book builds upon the Business Model Canvas and Value Proposition Canvas by integrating Assumptions Mapping and other powerful lean startup-style experiments." The Strategyzer
Free download: https://www.strategyzer.com/emails/testing-business-ideas-preview-free-download
To buy: https://www.strategyzer.com/books/testing-business-ideas-david-j-bland ; Amazon.com: Testing Business Ideas (9781119551447): David J. Bland, Alexander Osterwalder: Books https://amzn.to/2Pg7foy
Royal Virtues by Somdet Phra Buddhaghosajahn (P. A. Payutto) translated by Ja...Peerasak C.
Foreword
On the 13th October 2016 His Majesty King Bhumibol Adulyadej, the ninth monarch of his line, passed away. This was a cause of great grief to the people of Thailand. Before long his subjects were queuing in huge numbers to pay their respects to his body, a phenomenon that has continued for the many succeeding months. Now, with just over a year having passed, the Royal Cremation Ceremony is to take place on 26th October 2017.
On such a momentous occasion it is important that the admirable demonstration of gratitude for all that His Majesty has given to the nation, should be supplemented by the effort to express that gratitude by carrying on his good works for the longlasting benefit of our country. Last year I delivered a Dhamma discourse which encouraged this effort, and it has now been published as ธรรมของพระราชา; this book is its English translation.
I would like to express my appreciation for all the people with the faith and devotion to Dhamma, and with the best of wishes for the nation in mind, who have contributed to the publication of this book for free distribution. May the Dhamma be propagated and may wisdom be spread far and wide, for the long-lasting fulfilment of His Majesty the King’s fundamental goals: the welfare and happiness of all.
Somdet Phra Buddhaghosajahn
(P. A. Payutto)
---
Source: http://book.watnyanaves.net/index.php?floor=other-language
Reference
e-Conomy SEA is a multi-year research program launched by Google and Temasek in 2016. Bain & Company joined the program as lead research partner in 2019. The research leverages Bain analysis, Google Trends, Temasek research, industry sources and expert interviews to shed light on the Internet economy in Southeast Asia. The information included in this report is sourced as “Google & Temasek / Bain, e-Conomy SEA 2019” except from third parties specified otherwise.
Disclaimer
The information in this report is provided on an “as is” basis. This document was produced by and the opinions expressed are those of Google, Temasek, Bain and other third parties involved as of the date of writing and are subject to change. It has been prepared solely for information purposes over a limited time period to provide a perspective on the market. Projected market and financial information, analyses and conclusions contained herein should not be construed as definitive forecasts or guarantees of future performance or results. Google, Temasek, Bain or any of their affiliates or any third party involved makes no representation or warranty, either expressed or implied, as to the accuracy or completeness of the
information in the report and shall not be liable for any loss arising from the use hereof. Google does not provide market analysis or financial projections. Google internal data was not used in the development of this report.
General Population Census of the Kingdom of Cambodia 2019Peerasak C.
Provisional Population Totals of GPCC 2019 show that the total de facto population of Cambodia on March 3, 2019 stood at 15,288,489. This is the population that spent the night at the
place of enumeration, thereby excluding those that were abroad, even if only briefly. The total population has increased from 13,395,682 in the 2008 Census. Thus, the population has grown by 1,892,807 persons, which represents 14.1%, over the period of 11 years from 2008 to 2019. The male population was 7,418,577 (48.5%) and the female population stood at 7,869,912 (51.5%). The average size of households was stable since 2008 at 4.6 persons.
The first census conducted in Cambodia in 1962 after independence from France, counted a total population of 5.7 million. The demographic situation of the nation changed dramatically after this first census, because of war and civil unrest. The country carried out no further total counts until
1998. But demographers did undertake some population estimations for the purpose of planning and policy development. A Demographic Survey 1979-1980 estimated the total Cambodia population at approximately 6.6 million. Later, the Socio-Economic Survey of 1994 led by NIS estimated the total population of Cambodia at 9.9 million. In March 1996, the NIS conducted another Demographic Survey covering 20,000 households, which estimated the total population of Cambodia at 10.7 million. Next, the total population determined by the 1998 Census was 11.4 million. The NIS also undertook an Inter-Censal Survey in 2004 and found the population to have increased to 12.8 million. Following a pattern of steady increases, the 2008 Census obtained a result of 13.4 million and after an update by the Inter-Censal Survey of 2013 this figure rose to 14.7 million. Now the provisional result of the 2019 Census, sets the total de facto population at 15.3 million. Obviously, the final census result may differ slightly from this figure.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Deep Reinforcement Learning (Deep RL)
• What is it? Framework for learning to solve sequential decision
making problems.
• How? Trial and error in a world that provides occasional rewards
• Deep? Deep RL = RL + Neural Networks
[306, 307]
Deep Learning Deep RL
3. 2019https://deeplearning.mit.edu
Types of Learning
• Supervised Learning
• Semi-Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
It’s all “supervised” by a loss function!
*Someone has to say what’s good and what’s bad (see Socrates, Epictetus, Kant, Nietzsche, etc.)
Neural
Network
Input Output
Good or
Bad?
Supervision*
4. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Supervised learning is “teach by example”:
Here’s some examples, now learn patterns in these example.
Reinforcement learning is “teach by experience”:
Here’s a world, now learn patterns by exploring it.
5. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Machine Learning:
Supervised vs Reinforcement
Supervised learning is “teach by example”:
Here’s some examples, now learn patterns in these example.
Reinforcement learning is “teach by experience”:
Here’s a world, now learn patterns by exploring it.
[330]
Failure Success
6. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Reinforcement Learning in Humans
• Human appear to learn to walk through “very few examples” of
trial and error. How is an open question…
• Possible answers:
• Hardware: 230 million years of bipedal movement data.
• Imitation Learning: Observation of other humans walking.
• Algorithms: Better than backpropagation and stochastic gradient descent
[286, 291]
12. 2019https://deeplearning.mit.edu
Environment
Sensors
Sensor Data
Feature Extraction
Machine Learning
Reasoning
Knowledge
Effector
Planning
Action
Representation
Final breakthrough, 358 years after its conjecture:
“It was so indescribably beautiful; it was so simple and
so elegant. I couldn’t understand how I’d missed it and
I just stared at it in disbelief for twenty minutes. Then
during the day I walked around the department, and
I’d keep coming back to my desk looking to see if it
was still there. It was still there. I couldn’t contain
myself, I was so excited. It was the most important
moment of my working life. Nothing I ever do again
will mean as much."
15. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Reinforcement Learning Framework
At each step, the agent:
• Executes action
• Observe new state
• Receive reward
[80]
Open Questions:
• What cannot be modeled in
this way?
• What are the challenges of
learning in this framework?
16. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Environment and Actions
• Fully Observable (Chess) vs Partially Observable (Poker)
• Single Agent (Atari) vs Multi Agent (DeepTraffic)
• Deterministic (Cart Pole) vs Stochastic (DeepTraffic)
• Static (Chess) vs Dynamic (DeepTraffic)
• Discrete (Chess) vs Continuous (Cart Pole)
Note: Real-world environment might not technically be stochastic
or partially-observable but might as well be treated as such due
to their complexity.
17. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
The Challenge for RL in Real-World Applications
Reminder:
Supervised learning:
teach by example
Reinforcement learning:
teach by experience
Open Challenges. Two Options:
1. Real world observation + one-shot trial & error
2. Realistic simulation + transfer learning
1. Improve
Transfer
Learning
2. Improve
Simulation
18. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Major Componentsof anRLAgent
An RL agent may be directly or indirectly trying to learn a:
• Policy: agent’s behavior function
• Value function: how good is each state and/or action
• Model: agent’s representation of the environment
𝑠0, 𝑎0, 𝑟1, 𝑠1, 𝑎1, 𝑟2, … , 𝑠 𝑛−1, 𝑎 𝑛−1, 𝑟 𝑛, 𝑠 𝑛
state
action
reward
Terminal state
19. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Meaning of Life for RL Agent: Maximize Reward
[84]
• Future reward: 𝑅𝑡 = 𝑟𝑡 + 𝑟𝑡+1 + 𝑟𝑡+2 + ⋯ + 𝑟𝑛
• Discounted future reward:
𝑅 𝑡 = 𝑟𝑡 + 𝛾𝑟𝑡+1 + 𝛾2 𝑟 𝑡+2 + ⋯ + 𝛾 𝑛−𝑡 𝑟 𝑛
• A good strategy for an agent would be to always choose
an action that maximizes the (discounted) future reward
• Why “discounted”?
• Math trick to help analyze convergence
• Uncertainty due to environment stochasticity, partial
observability, or that life can end at any moment:
“If today were the last day of my life, would I want
to do what I’m about to do today?” – Steve Jobs
20. 2019https://deeplearning.mit.edu
actions: UP, DOWN, LEFT, RIGHT
(Stochastic) model of the world:
Action: UP
80% move UP
10% move LEFT
10% move RIGHT
Robot in a Room
+1
-1
START
• Reward +1 at [4,3], -1 at [4,2]
• Reward -0.04 for each step
• What’s the strategy to achieve max reward?
• We can learn the model and plan
• We can learn the value of (action, state) pairs and act greed/non-greedy
• We can learn the policy directly while sampling from it
21. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Optimal Policy for a Deterministic World
Reward: -0.04 for each step
actions: UP, DOWN, LEFT, RIGHT
When actions are deterministic:
UP
100% move UP
0% move LEFT
0% move RIGHT
Policy: Shortest path.
+1
-1
22. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Optimal Policy for a Stochastic World
Reward: -0.04 for each step
+1
-1
actions: UP, DOWN, LEFT, RIGHT
When actions are stochastic:
UP
80% move UP
10% move LEFT
10% move RIGHT
Policy: Shortest path. Avoid -UP around -1 square.
23. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Optimal Policy for a Stochastic World
Reward: -2 for each step
+1
-1
actions: UP, DOWN, LEFT, RIGHT
When actions are stochastic:
UP
80% move UP
10% move LEFT
10% move RIGHT
Policy: Shortest path.
24. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Optimal Policy for a Stochastic World
Reward: -0.1 for each step
+1
-1
+1
-1
Reward: -0.04 for each step
Less urgentMore urgent
25. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Optimal Policy for a Stochastic World
Reward: +0.01 for each step
actions: UP, DOWN, LEFT, RIGHT
When actions are stochastic:
UP
80% move UP
10% move LEFT
10% move RIGHT
Policy: Longest path.
+1
-1
26. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Lessons from Robot in Room
• Environment model has big impact on optimal policy
• Reward structure has big impact on optimal policy
27. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Reward structure may have
Unintended Consequences
Player gets reward based on:
1. Finishing time
2. Finishing position
3. Picking up “turbos”
[285]
Human AI (Deep RL Agent)
29. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Examples of Reinforcement Learning
Cart-Pole Balancing
• Goal— Balance the pole on top of a moving cart
• State— Pole angle, angular speed. Cart position, horizontal velocity.
• Actions— horizontal force to the cart
• Reward— 1 at each time step if the pole is upright
[166]
30. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Examples of Reinforcement Learning
Doom*
• Goal:
Eliminate all opponents
• State:
Raw game pixels of the game
• Actions:
Up, Down, Left, Right, Shoot, etc.
• Reward:
• Positive when eliminating an opponent,
negative when the agent is eliminated
[166]
* Added for important thought-provoking considerations of AI safety in the context of
autonomous weapons systems (see AGI lectures on the topic).
31. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Examples of Reinforcement Learning
Grasping Objects with Robotic Arm
• Goal - Pick an object of different shapes
• State - Raw pixels from camera
• Actions – Move arm. Grasp.
• Reward - Positive when pickup is successful
[332]
32. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
Examples of Reinforcement Learning
Human Life
• Goal - Survival? Happiness?
• State - Sight. Hearing. Taste. Smell. Touch.
• Actions - Think. Move.
• Reward – Homeostasis?
33. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
Key Takeaways for Real-World Impact
• Deep Learning:
• Fun part: Good algorithms that learn from data.
• Hard part: Good questions, huge amounts of representative data.
• Deep Reinforcement Learning:
• Fun part: Good algorithms that learn from data.
• Hard part: Defining a useful state space, action space, and reward.
• Hardest part: Getting meaningful data for the above formalization.
34. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
3 Types of Reinforcement Learning
Model-based
• Learn the model of
the world, then plan
using the model
• Update model often
• Re-plan often
[331, 333]
Value-based
• Learn the state or
state-action value
• Act by choosing best
action in state
• Exploration is a
necessary add-on
Policy-based
• Learn the stochastic
policy function that
maps state to action
• Act by sampling policy
• Exploration is baked in
37. 2019https://deeplearning.mit.edu
Q-Learning
• State-action value function: Q(s,a)
• Expected return when starting in s,
performing a, and following
• Q-Learning: Use any policy to estimate Q that maximizes future reward:
• Q directly approximates Q* (Bellman optimality equation)
• Independent of the policy being followed
• Only requirement: keep updating each (s,a) pair
s
a
s’
r
New State Old State Reward
Learning Rate Discount Factor
38. 2019https://deeplearning.mit.edu
Exploration vs Exploitation
• Deterministic/greedy policy won’t explore all actions
• Don’t know anything about the environment at the beginning
• Need to try all actions to find the optimal one
• ε-greedy policy
• With probability 1-ε perform the optimal/greedy action, otherwise random action
• Slowly move it towards greedy policy: ε -> 0
40. 2019https://deeplearning.mit.edu
Q-Learning: Representation Matters
• In practice, Value Iteration is impractical
• Very limited states/actions
• Cannot generalize to unobserved states
• Think about the Breakout game
• State: screen pixels
• Image size: 𝟖𝟒 × 𝟖𝟒 (resized)
• Consecutive 4 images
• Grayscale with 256 gray levels
𝟐𝟓𝟔 𝟖𝟒×𝟖𝟒×𝟒 rows in theQ-table!
= 1069,970 >> 1082 atoms in the universe
References: [83, 84]
42. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
DQN: Deep Q-Learning
[83]
Use a neural network to
approximate the Q-function:
43. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
Deep Q-Network (DQN): Atari
[83]
Mnih et al. "Playing atari with deep reinforcement learning." 2013.
44. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
DQN and Double DQN
• Loss function (squared error):
[83]
predictiontarget
• DQN: same network for both Q
• Double DQN: separate network for each Q
• Helps reduce bias introduced by the inaccuracies of
Q network at the beginning of training
45. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
DQN Tricks
• Experience Replay
• Stores experiences (actions, state transitions, and rewards) and creates
mini-batches from them for the training process
• Fixed Target Network
• Error calculation includes the target function depends on network
parameters and thus changes quickly. Updating it only every 1,000
steps increases stability of training process.
[83, 167]
46. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
Atari Breakout
[85]
After
120 Minutes
of Training
After
10 Minutes
of Training
After
240 Minutes
of Training
48. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Dueling DQN (DDQN)
• Decompose Q(s,a)
• V(s): the value of being at that state
• A(s,a): the advantage of taking action a in state s versus all other possible
actions at that state
• Use two streams:
• one that estimates the state value V(s)
• one that estimates the advantage for each action A(s,a)
• Useful for states where action choice does not affect Q(s,a)
[336]
49. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Prioritized Experience Replay
• Sample experiences based on impact not frequency of occurrence
[336]
51. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
Policy Gradient (PG)
• DQN (off-policy): Approximate Q and infer optimal policy
• PG (on-policy): Directly optimize policy space
[63]
Good illustrative explanation:
http://karpathy.github.io/2016/05/31/rl/
“Deep Reinforcement Learning:
Pong from Pixels”
Policy Network
52. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
Policy Gradient – Training
[63, 204]
• REINFORCE: Policy gradient that increases probability of good actions and
decreases probability of bad action:
53. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Policy Gradient
• Pros vs DQN:
• Messy World: If Q function is too complex to be learned, DQN may fail
miserably, while PG will still learn a good policy.
• Speed: Faster convergence
• Stochastic Policies: Capable of learning stochastic policies - DQN can’t
• Continuous actions: Much easier to model continuous action space
• Cons vs DQN:
• Data: Sample inefficient (needs more data)
• Stability: Less stable during training process.
• Poor credit assignment to (state, action) pairs for delayed rewards
[63, 335]
Problem with REINFORCE:
Calculating the reward at the
end, means all the actions will
be averaged as good because
the total reward was high.
55. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Advantage Actor-Critic (A2C)
• Combine DQN (value-based) and REINFORCE (policy-based)
• Two neural networks (Actor and Critic):
• Actor is policy-based: Samples the action from a policy
• Critic is value-based: Measures how good the chosen action is
[335]
• Update at each time step - temporal difference (TD) learning
56. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Asynchronous Advantage Actor-Critic (A3C)
[337]
• Both use parallelism in training
• A2C syncs up for global parameter update and then start each
iteration with the same policy
58. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Deep Deterministic Policy Gradient (DDPG)
• Actor-Critic framework for learning a deterministic policy
• Can be thought of as: DQN for continuous action spaces
• As with all DQN, following tricks are required:
• Experience Replay
• Target network
• Exploration: add noise to actions,
reducing scale of the noise
as training progresses
[341]
60. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Policy Optimization
• Progress beyond Vanilla Policy Gradient:
• Natural Policy Gradient
• TRPO
• PPO
• Basic idea in on-policy optimization:
Avoid taking bad actions that collapse the training performance.
[339]
Line Search:
First pick direction, then step size
Trust Region:
First pick step size, then direction
65. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
AlphaGo Zero Approach
[170]
• Same as the best before: Monte Carlo Tree Search (MCTS)
• Balance exploitation/exploration (going deep on promising positions or
exploring new underplayed positions)
• Use a neural network as “intuition” for which positions to
expand as part of MCTS (same as AlphaGo)
66. 2019https://deeplearning.mit.eduFor the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
AlphaGo Zero Approach
[170]
• Same as the best before: Monte Carlo Tree Search (MCTS)
• Balance exploitation/exploration (going deep on promising positions or
exploring new underplayed positions)
• Use a neural network as “intuition” for which positions to
expand as part of MCTS (same as AlphaGo)
• “Tricks”
• Use MCTS intelligent look-ahead (instead of human games) to improve
value estimates of play options
• Multi-task learning: “two-headed” network that outputs (1) move
probability and (2) probability of winning.
• Updated architecture: use residual networks
70. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
To date, for most successful robots operating in the real world:
Deep RL is not involved
[169]
71. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
To date, for most successful robots operating in the real world:
Deep RL is not involved
73. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
But… that’s slowly changing:
Learning to Drive: Beyond Pure Imitation (Waymo)
[343]
74. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
The Challenge for RL in Real-World Applications
Reminder:
Supervised learning:
teach by example
Reinforcement learning:
teach by experience
Open Challenges. Two Options:
1. Real world observation + one-shot trial & error
2. Realistic simulation + transfer learning
1. Improve
Transfer
Learning
2. Improve
Simulation
75. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Thinking Outside the Box:
Multiverse Theory and the Simulation Hypothesis
• Create an (infinite) set of simulation environments to learn in
so that our reality becomes just another sample from the set.
[342]
76. 2019https://deeplearning.mit.eduFor the full list of references visit:
https://hcai.mit.edu/references
Next Steps in Deep RL
• Lectures: https://deeplearning.mit.edu
• Tutorials: https://github.com/lexfridman/mit-deep-learning
• Advice for Research (from Spinning Up as a Deep RL Researcher by Joshua Achiam)
• Background
• Fundamentals in probability, statistics, multivariate calculus.
• Deep learning basics
• Deep RL basics
• TensorFlow (or PyTorch)
• Learn by doing
• Implement core deep RL algorithms (discussed today)
• Look for tricks and details in papers that were key to get it to work
• Iterate fast in simple environments (see Einstein quote on simplicity)
• Research
• Improve on an existing approach
• Focus on an unsolved task / benchmark
• Create a new task / problem that hasn’t been addressed with RL
[339]