SlideShare a Scribd company logo
Deep Learning
Shanghai
#4
Goal
• Help people get introduced into DL
• Help investors find potential projects
• Utilize wonderful techniques to solve problems
Review of #1 meetup
• Introduction to DL
• Pai Peng’s talk:
• DeepCamera: A Unified Framework for Recognizing Places-of-
Interest based on Deep ConvNets. CIKM 2015
• Problems left:
• 9 fraud detect,
• 7 social topic extraction,
• 7 image.
Review of #2 meetup
• Tom’s talk about Deep Learning in HealthCare
• Anson & John’s talk about Introduction to
RapidMiner & presentation of DL4J Deep
Learning extension
Review of #3 meetup
• PART 1: Deep Learning Program
• PART 2: informative sharing
• AlphaGo related technology by Davy
• CNN for text classifcation by Yale from Alibaba
Schedule
• PART 1: Deep Reinforcement Learning
• Reinforcement learning
• Deep Q-Network
• Atari games
• PART 2: evaluation platform
• RLLAB
• OpenAI gym
Part 1
• Deep reinforcement learning
• Reinforcement learning basis
• Deep Q-Network
• Atari games
Reinforcement Learning
• Reinforcement learning is an area of machine learning
inspired by behaviorist psychology, concerned with how
software agents ought to take actions in an environment
so as to maximize some notion of cumulative reward. [the
definition from wikipedia]
Reinforcement learning
intersection of various domains
• game theory,
• control theory,
• operations research,
• information theory,
• simulation-based
optimization,
• multi-agent systems,
• swarm intelligence,
• statistics,
• genetic algorithms.
economics and game theory
• In economics and game theory, reinforcement
learning may be used to explain how equilibrium
may arise under bounded rationality.
Machine learning
• In machine learning, the environment is typically
formulated as a Markov decision process (MDP)
as many reinforcement learning algorithms for this
context utilize dynamic programming techniques.
• The main difference between the classical
techniques and reinforcement learning algorithms
is that the latter do not need knowledge about the
MDP and they target large MDPs where exact
methods become infeasible.
Characteristics of RL
• Reinforcement learning differs from standard supervised
learning in that correct input/output pairs are never
presented, nor sub-optimal actions explicitly corrected.
• Further, there is a focus on on-line performance, which
involves finding a balance between exploration (of
uncharted territory) and exploitation (of current
knowledge).
• The exploration vs. exploitation trade-off in reinforcement
learning has been most thoroughly studied through the
multi-armed bandit problem and in finite MDPs.
Reinforcement Learning
From Richard Sutton’s book:
RL problems have Three characteristics:
1. being closed-loop in an essential way
2. not having direct instructions as to
what actions to take
3. where the consequences of actions,
including reward signals, play out over
extended time periods
Elements of RL
a policy, a reward signal, a value function, and, optionally,
a model of the environment.
policy
A policy defines the learning agent’s way of behaving at a
given time. Roughly speaking, a policy is a mapping from
perceived states of the environment to actions to be taken
when in those states.
reward signal
A reward signal defines the goal in a reinforcement learning
problem.
• On each time step, the environment sends to the
reinforcement learning agent a single number, a reward.
• The agent’s sole objective is to maximize the total
reward it receives over the long run.
• The reward signal thus defines what are the good and
bad events for the agent.
Value function
Whereas the reward signal indicates what is good in an
immediate sense, a value function specifies what is good
in the long run.
• Roughly speaking, the value of a state is the total amount
of reward an agent can expect to accumulate over the
future, starting from that state.
Comments on reinforcement
learning
• Rewards are basically given directly by the environment,
but values must be estimated and re-estimated from the
sequences of observations an agent makes over its entire
lifetime.
• In fact, the most important component of almost all
reinforcement learning algorithms we consider is a method
for efficiently estimating values.
• The central role of value estimation is arguably the most
important thing we have learned about reinforcement
learning over the last few decades.
Model
The fourth and final element of some reinforcement learning
systems is a model of the environment. This is something
that mimics the behavior of the environment, or more
generally, that allows inferences to be made about how the
environment will behave.
Model
• For example, given a state and action, the model might
predict the resultant next state and next reward. Models
are used for planning, by which we mean any way of
deciding on a course of action by considering possible
future situations before they are actually experienced.
• Methods for solving reinforcement learning problems that
use models and planning are called model-based
methods, as opposed to simpler model-free methods
that are explicitly trial-and-error learners—viewed as
almost the opposite of planning.
Definition of RL
• Reinforcement learning is a computational approach to
understanding and automat- ing goal-directed learning and
decision-making.
• It is distinguished from other computational approaches by
its emphasis on learning by an agent from direct interaction
with its environment, without relying on exemplary
supervision or complete models of the environment.
• In our opinion, reinforcement learning is the first field to
seriously address the computational issues that arise when
learning from interaction with an environment in order to
achieve long-term goals.
History of RL
The term “optimal control” came into use in the late 1950s to describe the problem of
designing a controller to minimize a measure of a dynamical system’s behavior over time.
One of the approaches to this problem was developed in the mid-1950s by Richard
Bellman and others through extending a nineteenth century theory of Hamilton and
Jacobi.
This approach uses the concepts of a dynamical system’s state and of a value function,
or “optimal return function,” to define a functional equation, now often called the Bellman
equation.
The class of methods for solving optimal control problems by solving this equation came
to be known as dynamic programming (Bellman, 1957a).
Bellman (1957b) also introduced the discrete stochastic version of the optimal control
problem known as Markovian decision processes (MDPs), and Ronald Howard (1960)
devised the policy iteration method for MDPs. All of these are essential elements
underlying the theory and algorithms of modern reinforcement learning.
dynamic programming for
reinforcement learning
Dynamic programming is widely considered the only feasible way of solving
general stochastic optimal control problems. It suffers from what Bellman called
“the curse of dimensionality,” meaning that its computational requirements grow
exponentially with the number of state variables, but it is still far more efficient
and more widely applicable than any other general method.
Dynamic programming has been extensively developed since the late 1950s,
including extensions to partially observable MDPs (surveyed by Lovejoy, 1991),
many applications (surveyed by White, 1985, 1988, 1993), approximation
methods (surveyed by Rust, 1996), and asynchronous methods (Bertsekas,
1982, 1983).
Many excellent modern treatments of dynamic programming are available (e.g.,
Bertsekas, 2005, 2012; Puterman, 1994; Ross, 1983; and Whittle, 1982, 1983).
Bryson (1996) provides an authoritative history of optimal control.
Atari Games
• https://deepmind.com/dqn.html
Atari Games
• breakout
• https://www.youtube.com/watch?v=UXurvvDY93o
• https://github.com/corywalker/deep_q_rl/tree/pull_request
• van Hasselt, H., Guez, A., & Silver, D. (2015). Deep
Reinforcement Learning with Double Q-learning. arXiv preprint
arXiv:1509.06461.
Reinforcement Learning
• Another paradigm machine learning
• learning from interaction
Ingredients of RL
• Markov Decision Process
• Discounted Future Reward
• Q-learning
Questions in
Reinforcement learning
• What are the main challenges in reinforcement
learning? We will cover the credit assignment problem
and the exploration-exploitation dilemma here.
• How to formalize reinforcement learning in
mathematical terms? We will define Markov Decision
Process and use it for reasoning about reinforcement
learning.
• How do we form long-term strategies? We define
“discounted future reward”, that forms the main basis
for the algorithms in the next sections.
Questions in
Reinforcement learning(con.)
• How can we estimate or approximate the future reward?
Simple table-based Q-learning algorithm is defined and
explained here.
• What if our state space is too big? Here we see how Q-
table can be replaced with a (deep) neural network.
• What do we need to make it actually work? Experience
replay technique will be discussed here, that stabilizes the
learning with neural networks.
• Are we done yet? Finally we will consider some simple
solutions to the exploration-exploitation problem.
Go Deeper Reinforcement
Learning
• Deep Q-Network [Volodymyr Mnih]
Deep Q-Network
• Deep Q Network
• Experience Replay
• Exploration-Exploitation
Structure of DQN
Parameter settings
• The state of the environment in the Breakout
game can be defined by the location of the
paddle, location and direction of the ball and the
presence or absence of each individual brick.
This intuitive representation however is game
specific.
• If we apply the same preprocessing to game
screens as in the DeepMind paper – take the
four last screen images, resize them to 84×84
and convert to grayscale with 256 gray levels –
we would have 25684x84x4 ≈ 1067970 possible
game states. This means 1067970 rows in our
imaginary Q-table
Deep Q-learning algorithm
Experience replay
Deep Q-learning algorithm
with experience replay
Pipeline of DQN
Extended Data Figure 1 | Two-dimensional t-SNE embedding of the
representations in the last hidden layer assigned by DQN to game states
experienced during a combination of human and agent play in Space
Invaders. The plot was generated by running the t-SNE algorithm25
on the last
hidden layer representation assigned by DQN to game states experienced
during a combination of human (30 min) and agent (2 h) play. The fact that
there is similar structure in the two-dimensional embeddings corresponding
to the DQN representation of states experienced during human play (orange
points) and DQN play (blue points) suggests that the representations
learned by DQN do indeed generalize to data generated from policies other
than its own. The presence in the t-SNE embedding of overlapping clusters
of points corresponding to the network representation of states experienced
during human and agent play shows that the DQN agent also follows
sequences of states similar to those found in human play. Screenshots
corresponding to selected states are shown (human: orange border; DQN:
blue border).
• Double Q-learning http://arxiv.org/abs/1509.06461
• Prioritized Experience Replay http://arxiv.org/abs/
1511.05952
• Dueling Network Architecture http://arxiv.org/abs/
1511.06581
• extension to continuous action space http://
arxiv.org/abs/1509.02971
• But beware, that deep Q-learning has been
patented by Google !!!
OpenAI
• From (partially)closed to open
• release a benchmark platform for reinforcement
learning
Plan for the DL program
• explanations about the design, the
implementation, and the tricks inside DL
• instructions for solving problems (we prefer
using DL)
• inspirations for novel thoughts and applications
DL startups
• clarifi
• alchemyapi
• metamind
• …
online courses
• https://www.cs.ox.ac.uk/people/nando.defreitas/
machinelearning/ Lecture15-16
• http://www0.cs.ucl.ac.uk/staff/d.silver/web/
Teaching.html (rl)
• http://rll.berkeley.edu/deeprlcourse/
Benchmarks
• RLLAB
• OpenAI gym
Recall the Plan
• explanations about the design, the
implementation, and the tricks inside DL
• instructions for solving problems (we prefer
using DL)
• inspirations for novel thoughts and applications
Standpoint
• independent researcher and
practicers
• open to novel ideas
• focus on technology for
addressing humanity issues
• open to sponsors
Tracking
• trello board
• meetup page
• periscope
References
• http://www.nervanasys.com/demystifying-deep-
reinforcement-learning/
• http://www.nature.com/news/game-playing-
software-holds-lessons-for-neuroscience-1.16979
• http://www.nature.com/nature/journal/v518/
n7540/pdf/nature14236.pdf

More Related Content

Viewers also liked

苏宁图像智能分析实践
苏宁图像智能分析实践苏宁图像智能分析实践
苏宁图像智能分析实践
Xiaohu ZHU
 
Deep cv 101
Deep cv 101Deep cv 101
Deep cv 101
Xiaohu ZHU
 
CBIR in the Era of Deep Learning
CBIR in the Era of Deep LearningCBIR in the Era of Deep Learning
CBIR in the Era of Deep Learning
Xiaohu ZHU
 
Hangzhou Deep Learning Meetup-Deep Reinforcement Learning
Hangzhou Deep Learning Meetup-Deep Reinforcement LearningHangzhou Deep Learning Meetup-Deep Reinforcement Learning
Hangzhou Deep Learning Meetup-Deep Reinforcement Learning
Xiaohu ZHU
 
Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1
Xiaohu ZHU
 
Deep Reinforcement Learning An Introduction
Deep Reinforcement Learning An IntroductionDeep Reinforcement Learning An Introduction
Deep Reinforcement Learning An Introduction
Xiaohu ZHU
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
Xiaohu ZHU
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
SlideShare
 

Viewers also liked (9)

苏宁图像智能分析实践
苏宁图像智能分析实践苏宁图像智能分析实践
苏宁图像智能分析实践
 
Deep cv 101
Deep cv 101Deep cv 101
Deep cv 101
 
CBIR in the Era of Deep Learning
CBIR in the Era of Deep LearningCBIR in the Era of Deep Learning
CBIR in the Era of Deep Learning
 
Hangzhou Deep Learning Meetup-Deep Reinforcement Learning
Hangzhou Deep Learning Meetup-Deep Reinforcement LearningHangzhou Deep Learning Meetup-Deep Reinforcement Learning
Hangzhou Deep Learning Meetup-Deep Reinforcement Learning
 
Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1Shanghai Deep Learning Meetup #1
Shanghai Deep Learning Meetup #1
 
Deep Reinforcement Learning An Introduction
Deep Reinforcement Learning An IntroductionDeep Reinforcement Learning An Introduction
Deep Reinforcement Learning An Introduction
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similar to Shanghai deep learning meetup 4

Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1
YasutoTamura1
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
MohammadAsim91
 
Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...
Enamul Islam
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.ppt
HenokGetachew15
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
YasutoTamura1
 
A Review of Constraint Programming
A Review of Constraint ProgrammingA Review of Constraint Programming
A Review of Constraint Programming
Editor IJCATR
 
Quantitative management
Quantitative managementQuantitative management
Quantitative managementsmumbahelp
 
Fundamentals of Quantitative Analysis
Fundamentals of Quantitative AnalysisFundamentals of Quantitative Analysis
Fundamentals of Quantitative Analysis
Jubayer Alam Shoikat
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
Reinforcement Learning.pdf
Reinforcement Learning.pdfReinforcement Learning.pdf
Reinforcement Learning.pdf
hemayadav41
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
Operations Research
Operations ResearchOperations Research
Operations Research
Dr T.Sivakami
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
DongHyun Kwak
 
OR chapter 1.pdf
OR chapter 1.pdfOR chapter 1.pdf
OR chapter 1.pdf
AlexHayme
 
Final hrm project 2003
Final hrm project 2003Final hrm project 2003
Final hrm project 2003Adil Shaikh
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Reinforcement Learning with Deep Architectures
Reinforcement Learning with Deep ArchitecturesReinforcement Learning with Deep Architectures
Reinforcement Learning with Deep Architectures
Willy Marroquin (WillyDevNET)
 
LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4Hendrik Drachsler
 

Similar to Shanghai deep learning meetup 4 (20)

Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
 
Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...Application of linear programming technique for staff training of register se...
Application of linear programming technique for staff training of register se...
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.ppt
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
A Review of Constraint Programming
A Review of Constraint ProgrammingA Review of Constraint Programming
A Review of Constraint Programming
 
Quantitative management
Quantitative managementQuantitative management
Quantitative management
 
Fundamentals of Quantitative Analysis
Fundamentals of Quantitative AnalysisFundamentals of Quantitative Analysis
Fundamentals of Quantitative Analysis
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Reinforcement Learning.pdf
Reinforcement Learning.pdfReinforcement Learning.pdf
Reinforcement Learning.pdf
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Operations Research
Operations ResearchOperations Research
Operations Research
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
OR chapter 1.pdf
OR chapter 1.pdfOR chapter 1.pdf
OR chapter 1.pdf
 
Final hrm project 2003
Final hrm project 2003Final hrm project 2003
Final hrm project 2003
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Reinforcement Learning with Deep Architectures
Reinforcement Learning with Deep ArchitecturesReinforcement Learning with Deep Architectures
Reinforcement Learning with Deep Architectures
 
LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4LinkedUp kickoff meeting session 4
LinkedUp kickoff meeting session 4
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Shanghai deep learning meetup 4

  • 2. Goal • Help people get introduced into DL • Help investors find potential projects • Utilize wonderful techniques to solve problems
  • 3. Review of #1 meetup • Introduction to DL • Pai Peng’s talk: • DeepCamera: A Unified Framework for Recognizing Places-of- Interest based on Deep ConvNets. CIKM 2015 • Problems left: • 9 fraud detect, • 7 social topic extraction, • 7 image.
  • 4. Review of #2 meetup • Tom’s talk about Deep Learning in HealthCare • Anson & John’s talk about Introduction to RapidMiner & presentation of DL4J Deep Learning extension
  • 5. Review of #3 meetup • PART 1: Deep Learning Program • PART 2: informative sharing • AlphaGo related technology by Davy • CNN for text classifcation by Yale from Alibaba
  • 6. Schedule • PART 1: Deep Reinforcement Learning • Reinforcement learning • Deep Q-Network • Atari games • PART 2: evaluation platform • RLLAB • OpenAI gym
  • 7. Part 1 • Deep reinforcement learning • Reinforcement learning basis • Deep Q-Network • Atari games
  • 8. Reinforcement Learning • Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. [the definition from wikipedia]
  • 9. Reinforcement learning intersection of various domains • game theory, • control theory, • operations research, • information theory, • simulation-based optimization, • multi-agent systems, • swarm intelligence, • statistics, • genetic algorithms.
  • 10. economics and game theory • In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.
  • 11. Machine learning • In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques. • The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.
  • 12. Characteristics of RL • Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. • Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). • The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.
  • 13. Reinforcement Learning From Richard Sutton’s book: RL problems have Three characteristics: 1. being closed-loop in an essential way 2. not having direct instructions as to what actions to take 3. where the consequences of actions, including reward signals, play out over extended time periods
  • 14. Elements of RL a policy, a reward signal, a value function, and, optionally, a model of the environment.
  • 15. policy A policy defines the learning agent’s way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states.
  • 16. reward signal A reward signal defines the goal in a reinforcement learning problem. • On each time step, the environment sends to the reinforcement learning agent a single number, a reward. • The agent’s sole objective is to maximize the total reward it receives over the long run. • The reward signal thus defines what are the good and bad events for the agent.
  • 17. Value function Whereas the reward signal indicates what is good in an immediate sense, a value function specifies what is good in the long run. • Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state.
  • 18. Comments on reinforcement learning • Rewards are basically given directly by the environment, but values must be estimated and re-estimated from the sequences of observations an agent makes over its entire lifetime. • In fact, the most important component of almost all reinforcement learning algorithms we consider is a method for efficiently estimating values. • The central role of value estimation is arguably the most important thing we have learned about reinforcement learning over the last few decades.
  • 19. Model The fourth and final element of some reinforcement learning systems is a model of the environment. This is something that mimics the behavior of the environment, or more generally, that allows inferences to be made about how the environment will behave.
  • 20. Model • For example, given a state and action, the model might predict the resultant next state and next reward. Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced. • Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler model-free methods that are explicitly trial-and-error learners—viewed as almost the opposite of planning.
  • 21. Definition of RL • Reinforcement learning is a computational approach to understanding and automat- ing goal-directed learning and decision-making. • It is distinguished from other computational approaches by its emphasis on learning by an agent from direct interaction with its environment, without relying on exemplary supervision or complete models of the environment. • In our opinion, reinforcement learning is the first field to seriously address the computational issues that arise when learning from interaction with an environment in order to achieve long-term goals.
  • 22. History of RL The term “optimal control” came into use in the late 1950s to describe the problem of designing a controller to minimize a measure of a dynamical system’s behavior over time. One of the approaches to this problem was developed in the mid-1950s by Richard Bellman and others through extending a nineteenth century theory of Hamilton and Jacobi. This approach uses the concepts of a dynamical system’s state and of a value function, or “optimal return function,” to define a functional equation, now often called the Bellman equation. The class of methods for solving optimal control problems by solving this equation came to be known as dynamic programming (Bellman, 1957a). Bellman (1957b) also introduced the discrete stochastic version of the optimal control problem known as Markovian decision processes (MDPs), and Ronald Howard (1960) devised the policy iteration method for MDPs. All of these are essential elements underlying the theory and algorithms of modern reinforcement learning.
  • 23. dynamic programming for reinforcement learning Dynamic programming is widely considered the only feasible way of solving general stochastic optimal control problems. It suffers from what Bellman called “the curse of dimensionality,” meaning that its computational requirements grow exponentially with the number of state variables, but it is still far more efficient and more widely applicable than any other general method. Dynamic programming has been extensively developed since the late 1950s, including extensions to partially observable MDPs (surveyed by Lovejoy, 1991), many applications (surveyed by White, 1985, 1988, 1993), approximation methods (surveyed by Rust, 1996), and asynchronous methods (Bertsekas, 1982, 1983). Many excellent modern treatments of dynamic programming are available (e.g., Bertsekas, 2005, 2012; Puterman, 1994; Ross, 1983; and Whittle, 1982, 1983). Bryson (1996) provides an authoritative history of optimal control.
  • 25. Atari Games • breakout • https://www.youtube.com/watch?v=UXurvvDY93o • https://github.com/corywalker/deep_q_rl/tree/pull_request • van Hasselt, H., Guez, A., & Silver, D. (2015). Deep Reinforcement Learning with Double Q-learning. arXiv preprint arXiv:1509.06461.
  • 26. Reinforcement Learning • Another paradigm machine learning • learning from interaction
  • 27. Ingredients of RL • Markov Decision Process • Discounted Future Reward • Q-learning
  • 28. Questions in Reinforcement learning • What are the main challenges in reinforcement learning? We will cover the credit assignment problem and the exploration-exploitation dilemma here. • How to formalize reinforcement learning in mathematical terms? We will define Markov Decision Process and use it for reasoning about reinforcement learning. • How do we form long-term strategies? We define “discounted future reward”, that forms the main basis for the algorithms in the next sections.
  • 29. Questions in Reinforcement learning(con.) • How can we estimate or approximate the future reward? Simple table-based Q-learning algorithm is defined and explained here. • What if our state space is too big? Here we see how Q- table can be replaced with a (deep) neural network. • What do we need to make it actually work? Experience replay technique will be discussed here, that stabilizes the learning with neural networks. • Are we done yet? Finally we will consider some simple solutions to the exploration-exploitation problem.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. Go Deeper Reinforcement Learning • Deep Q-Network [Volodymyr Mnih]
  • 36. Deep Q-Network • Deep Q Network • Experience Replay • Exploration-Exploitation
  • 39.
  • 40. • The state of the environment in the Breakout game can be defined by the location of the paddle, location and direction of the ball and the presence or absence of each individual brick. This intuitive representation however is game specific.
  • 41. • If we apply the same preprocessing to game screens as in the DeepMind paper – take the four last screen images, resize them to 84×84 and convert to grayscale with 256 gray levels – we would have 25684x84x4 ≈ 1067970 possible game states. This means 1067970 rows in our imaginary Q-table
  • 44. Deep Q-learning algorithm with experience replay
  • 46.
  • 47.
  • 48.
  • 49. Extended Data Figure 1 | Two-dimensional t-SNE embedding of the representations in the last hidden layer assigned by DQN to game states experienced during a combination of human and agent play in Space Invaders. The plot was generated by running the t-SNE algorithm25 on the last hidden layer representation assigned by DQN to game states experienced during a combination of human (30 min) and agent (2 h) play. The fact that there is similar structure in the two-dimensional embeddings corresponding to the DQN representation of states experienced during human play (orange points) and DQN play (blue points) suggests that the representations learned by DQN do indeed generalize to data generated from policies other than its own. The presence in the t-SNE embedding of overlapping clusters of points corresponding to the network representation of states experienced during human and agent play shows that the DQN agent also follows sequences of states similar to those found in human play. Screenshots corresponding to selected states are shown (human: orange border; DQN: blue border).
  • 50. • Double Q-learning http://arxiv.org/abs/1509.06461 • Prioritized Experience Replay http://arxiv.org/abs/ 1511.05952 • Dueling Network Architecture http://arxiv.org/abs/ 1511.06581 • extension to continuous action space http:// arxiv.org/abs/1509.02971
  • 51. • But beware, that deep Q-learning has been patented by Google !!!
  • 52. OpenAI • From (partially)closed to open • release a benchmark platform for reinforcement learning
  • 53. Plan for the DL program • explanations about the design, the implementation, and the tricks inside DL • instructions for solving problems (we prefer using DL) • inspirations for novel thoughts and applications
  • 54. DL startups • clarifi • alchemyapi • metamind • …
  • 55. online courses • https://www.cs.ox.ac.uk/people/nando.defreitas/ machinelearning/ Lecture15-16 • http://www0.cs.ucl.ac.uk/staff/d.silver/web/ Teaching.html (rl) • http://rll.berkeley.edu/deeprlcourse/
  • 57. Recall the Plan • explanations about the design, the implementation, and the tricks inside DL • instructions for solving problems (we prefer using DL) • inspirations for novel thoughts and applications
  • 58. Standpoint • independent researcher and practicers • open to novel ideas • focus on technology for addressing humanity issues • open to sponsors
  • 59. Tracking • trello board • meetup page • periscope
  • 60.