The document summarizes discussions from several Deep Learning meetups in Shanghai. It provides an overview of reinforcement learning concepts like Markov Decision Processes and Q-learning. It then describes how Deep Q-Networks were used to solve Atari games by representing the Q-table with a deep neural network and using experience replay. Finally, it discusses benchmarks and platforms for reinforcement learning like OpenAI Gym and potential directions for the Deep Learning program.
slides for Shanghai Deep Learning Meetup #1
containing the goal of our meetup group and basic introduction to the dl area, elementary theory as well as frameworks, such as torch, tensorflow, theano, mxnet, etc.
How to Make Awesome SlideShares: Tips & TricksSlideShare
Turbocharge your online presence with SlideShare. We provide the best tips and tricks for succeeding on SlideShare. Get ideas for what to upload, tips for designing your deck and more.
A constraint is defined as a logical relation among several unknown quantities or variables, each taking a value in a given
domain. Constraint Programming (CP) is an emergent field in operations research. Constraint programming is based on feasibility
which means finding a feasible solution rather than optimization which means finding an optimal solution and focuses on the
constraints and variables domain rather than the objective functions. While defining a set of constraints, this may seem a simple way to
model a real-world problem but finding a good model that works well with a chosen solver is not that easy. A model could be very
hard to solve if it is poorly chosen
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
This article introduces the concept of reinforcement learning (RL) and its application in building intelligent agents. It covers key RL techniques such as Markov Decision Processes, Q-Learning, Deep Q-Networks, and Policy Gradient Methods. The article also highlights the role of RL in data science, including data analysis, predictive modeling, and recommendation systems.
This presentations covers Definition of Operations Research , Models, Scope,Phases ,advantages,limitations, tools and techniques in OR and Characteristics of Operations research
slides for Shanghai Deep Learning Meetup #1
containing the goal of our meetup group and basic introduction to the dl area, elementary theory as well as frameworks, such as torch, tensorflow, theano, mxnet, etc.
How to Make Awesome SlideShares: Tips & TricksSlideShare
Turbocharge your online presence with SlideShare. We provide the best tips and tricks for succeeding on SlideShare. Get ideas for what to upload, tips for designing your deck and more.
A constraint is defined as a logical relation among several unknown quantities or variables, each taking a value in a given
domain. Constraint Programming (CP) is an emergent field in operations research. Constraint programming is based on feasibility
which means finding a feasible solution rather than optimization which means finding an optimal solution and focuses on the
constraints and variables domain rather than the objective functions. While defining a set of constraints, this may seem a simple way to
model a real-world problem but finding a good model that works well with a chosen solver is not that easy. A model could be very
hard to solve if it is poorly chosen
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
This article introduces the concept of reinforcement learning (RL) and its application in building intelligent agents. It covers key RL techniques such as Markov Decision Processes, Q-Learning, Deep Q-Networks, and Policy Gradient Methods. The article also highlights the role of RL in data science, including data analysis, predictive modeling, and recommendation systems.
This presentations covers Definition of Operations Research , Models, Scope,Phases ,advantages,limitations, tools and techniques in OR and Characteristics of Operations research
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
Despite remarkable successes in various domains such as robotics and games, Reinforcement Learning (RL) still struggles with exploration inefficiency. For example, in hard Atari games, state-of-the-art agents often require billions of trial actions, equivalent to years of practice, while a moderately skilled human player can achieve the same score in just a few hours of play. This contrast emerges from the difference in exploration strategies between humans, leveraging memory, intuition and experience, and current RL agents, primarily relying on random trials and errors. This tutorial reviews recent advances in enhancing RL exploration efficiency through intrinsic motivation or curiosity, allowing agents to navigate environments without external rewards. Unlike previous surveys, we analyze intrinsic motivation through a memory-centric perspective, drawing parallels between human and agent curiosity, and providing a memory-driven taxonomy of intrinsic motivation approaches.
The talk consists of three main parts. Part A provides a brief introduction to RL basics, delves into the historical context of the explore-exploit dilemma, and raises the challenge of exploration inefficiency. In Part B, we present a taxonomy of self-motivated agents leveraging deliberate, RAM-like, and replay memory models to compute surprise, novelty, and goal, respectively. Part C explores advanced topics, presenting recent methods using language models and causality for exploration. Whenever possible, case studies and hands-on coding demonstrations. will be presented.
There is both theoretical and empirical evidence that deep architectures may be more appropriate than shallow architectures for learning functions which exhibit hierarchical structure, and which can represent high level abstractions. An importantdevelopmentinmachinelearningresearchinthepastfewyearshasbeena collectionofalgorithmsthatcantrainvariousdeeparchitectureseffectively. These methods have already led to many successes in the areas of supervised and unsupervised learning. They may prove to be just as useful in reinforcement learning aswell,sincesolvingareinforcementlearningproblemdependsoneffectivelyapproximating one or more state-value functions, which in general are just as likely to exhibit hierarchical structure as functions encountered in other settings. In this paper, we consider some of the issues that arise when trying to integrate ideas from deep learning into the reinforcement learning framework, present a class of algorithms which we refer to as Iterative Feature Extracting Learning Agents (IFELAs), and compare their performance on the inverted pendulum problem to morestandardapproaches.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Goal
• Help people get introduced into DL
• Help investors find potential projects
• Utilize wonderful techniques to solve problems
3. Review of #1 meetup
• Introduction to DL
• Pai Peng’s talk:
• DeepCamera: A Unified Framework for Recognizing Places-of-
Interest based on Deep ConvNets. CIKM 2015
• Problems left:
• 9 fraud detect,
• 7 social topic extraction,
• 7 image.
4. Review of #2 meetup
• Tom’s talk about Deep Learning in HealthCare
• Anson & John’s talk about Introduction to
RapidMiner & presentation of DL4J Deep
Learning extension
5. Review of #3 meetup
• PART 1: Deep Learning Program
• PART 2: informative sharing
• AlphaGo related technology by Davy
• CNN for text classifcation by Yale from Alibaba
6. Schedule
• PART 1: Deep Reinforcement Learning
• Reinforcement learning
• Deep Q-Network
• Atari games
• PART 2: evaluation platform
• RLLAB
• OpenAI gym
7. Part 1
• Deep reinforcement learning
• Reinforcement learning basis
• Deep Q-Network
• Atari games
8. Reinforcement Learning
• Reinforcement learning is an area of machine learning
inspired by behaviorist psychology, concerned with how
software agents ought to take actions in an environment
so as to maximize some notion of cumulative reward. [the
definition from wikipedia]
9. Reinforcement learning
intersection of various domains
• game theory,
• control theory,
• operations research,
• information theory,
• simulation-based
optimization,
• multi-agent systems,
• swarm intelligence,
• statistics,
• genetic algorithms.
10. economics and game theory
• In economics and game theory, reinforcement
learning may be used to explain how equilibrium
may arise under bounded rationality.
11. Machine learning
• In machine learning, the environment is typically
formulated as a Markov decision process (MDP)
as many reinforcement learning algorithms for this
context utilize dynamic programming techniques.
• The main difference between the classical
techniques and reinforcement learning algorithms
is that the latter do not need knowledge about the
MDP and they target large MDPs where exact
methods become infeasible.
12. Characteristics of RL
• Reinforcement learning differs from standard supervised
learning in that correct input/output pairs are never
presented, nor sub-optimal actions explicitly corrected.
• Further, there is a focus on on-line performance, which
involves finding a balance between exploration (of
uncharted territory) and exploitation (of current
knowledge).
• The exploration vs. exploitation trade-off in reinforcement
learning has been most thoroughly studied through the
multi-armed bandit problem and in finite MDPs.
13. Reinforcement Learning
From Richard Sutton’s book:
RL problems have Three characteristics:
1. being closed-loop in an essential way
2. not having direct instructions as to
what actions to take
3. where the consequences of actions,
including reward signals, play out over
extended time periods
14. Elements of RL
a policy, a reward signal, a value function, and, optionally,
a model of the environment.
15. policy
A policy defines the learning agent’s way of behaving at a
given time. Roughly speaking, a policy is a mapping from
perceived states of the environment to actions to be taken
when in those states.
16. reward signal
A reward signal defines the goal in a reinforcement learning
problem.
• On each time step, the environment sends to the
reinforcement learning agent a single number, a reward.
• The agent’s sole objective is to maximize the total
reward it receives over the long run.
• The reward signal thus defines what are the good and
bad events for the agent.
17. Value function
Whereas the reward signal indicates what is good in an
immediate sense, a value function specifies what is good
in the long run.
• Roughly speaking, the value of a state is the total amount
of reward an agent can expect to accumulate over the
future, starting from that state.
18. Comments on reinforcement
learning
• Rewards are basically given directly by the environment,
but values must be estimated and re-estimated from the
sequences of observations an agent makes over its entire
lifetime.
• In fact, the most important component of almost all
reinforcement learning algorithms we consider is a method
for efficiently estimating values.
• The central role of value estimation is arguably the most
important thing we have learned about reinforcement
learning over the last few decades.
19. Model
The fourth and final element of some reinforcement learning
systems is a model of the environment. This is something
that mimics the behavior of the environment, or more
generally, that allows inferences to be made about how the
environment will behave.
20. Model
• For example, given a state and action, the model might
predict the resultant next state and next reward. Models
are used for planning, by which we mean any way of
deciding on a course of action by considering possible
future situations before they are actually experienced.
• Methods for solving reinforcement learning problems that
use models and planning are called model-based
methods, as opposed to simpler model-free methods
that are explicitly trial-and-error learners—viewed as
almost the opposite of planning.
21. Definition of RL
• Reinforcement learning is a computational approach to
understanding and automat- ing goal-directed learning and
decision-making.
• It is distinguished from other computational approaches by
its emphasis on learning by an agent from direct interaction
with its environment, without relying on exemplary
supervision or complete models of the environment.
• In our opinion, reinforcement learning is the first field to
seriously address the computational issues that arise when
learning from interaction with an environment in order to
achieve long-term goals.
22. History of RL
The term “optimal control” came into use in the late 1950s to describe the problem of
designing a controller to minimize a measure of a dynamical system’s behavior over time.
One of the approaches to this problem was developed in the mid-1950s by Richard
Bellman and others through extending a nineteenth century theory of Hamilton and
Jacobi.
This approach uses the concepts of a dynamical system’s state and of a value function,
or “optimal return function,” to define a functional equation, now often called the Bellman
equation.
The class of methods for solving optimal control problems by solving this equation came
to be known as dynamic programming (Bellman, 1957a).
Bellman (1957b) also introduced the discrete stochastic version of the optimal control
problem known as Markovian decision processes (MDPs), and Ronald Howard (1960)
devised the policy iteration method for MDPs. All of these are essential elements
underlying the theory and algorithms of modern reinforcement learning.
23. dynamic programming for
reinforcement learning
Dynamic programming is widely considered the only feasible way of solving
general stochastic optimal control problems. It suffers from what Bellman called
“the curse of dimensionality,” meaning that its computational requirements grow
exponentially with the number of state variables, but it is still far more efficient
and more widely applicable than any other general method.
Dynamic programming has been extensively developed since the late 1950s,
including extensions to partially observable MDPs (surveyed by Lovejoy, 1991),
many applications (surveyed by White, 1985, 1988, 1993), approximation
methods (surveyed by Rust, 1996), and asynchronous methods (Bertsekas,
1982, 1983).
Many excellent modern treatments of dynamic programming are available (e.g.,
Bertsekas, 2005, 2012; Puterman, 1994; Ross, 1983; and Whittle, 1982, 1983).
Bryson (1996) provides an authoritative history of optimal control.
25. Atari Games
• breakout
• https://www.youtube.com/watch?v=UXurvvDY93o
• https://github.com/corywalker/deep_q_rl/tree/pull_request
• van Hasselt, H., Guez, A., & Silver, D. (2015). Deep
Reinforcement Learning with Double Q-learning. arXiv preprint
arXiv:1509.06461.
27. Ingredients of RL
• Markov Decision Process
• Discounted Future Reward
• Q-learning
28. Questions in
Reinforcement learning
• What are the main challenges in reinforcement
learning? We will cover the credit assignment problem
and the exploration-exploitation dilemma here.
• How to formalize reinforcement learning in
mathematical terms? We will define Markov Decision
Process and use it for reasoning about reinforcement
learning.
• How do we form long-term strategies? We define
“discounted future reward”, that forms the main basis
for the algorithms in the next sections.
29. Questions in
Reinforcement learning(con.)
• How can we estimate or approximate the future reward?
Simple table-based Q-learning algorithm is defined and
explained here.
• What if our state space is too big? Here we see how Q-
table can be replaced with a (deep) neural network.
• What do we need to make it actually work? Experience
replay technique will be discussed here, that stabilizes the
learning with neural networks.
• Are we done yet? Finally we will consider some simple
solutions to the exploration-exploitation problem.
40. • The state of the environment in the Breakout
game can be defined by the location of the
paddle, location and direction of the ball and the
presence or absence of each individual brick.
This intuitive representation however is game
specific.
41. • If we apply the same preprocessing to game
screens as in the DeepMind paper – take the
four last screen images, resize them to 84×84
and convert to grayscale with 256 gray levels –
we would have 25684x84x4 ≈ 1067970 possible
game states. This means 1067970 rows in our
imaginary Q-table
49. Extended Data Figure 1 | Two-dimensional t-SNE embedding of the
representations in the last hidden layer assigned by DQN to game states
experienced during a combination of human and agent play in Space
Invaders. The plot was generated by running the t-SNE algorithm25
on the last
hidden layer representation assigned by DQN to game states experienced
during a combination of human (30 min) and agent (2 h) play. The fact that
there is similar structure in the two-dimensional embeddings corresponding
to the DQN representation of states experienced during human play (orange
points) and DQN play (blue points) suggests that the representations
learned by DQN do indeed generalize to data generated from policies other
than its own. The presence in the t-SNE embedding of overlapping clusters
of points corresponding to the network representation of states experienced
during human and agent play shows that the DQN agent also follows
sequences of states similar to those found in human play. Screenshots
corresponding to selected states are shown (human: orange border; DQN:
blue border).
53. Plan for the DL program
• explanations about the design, the
implementation, and the tricks inside DL
• instructions for solving problems (we prefer
using DL)
• inspirations for novel thoughts and applications
57. Recall the Plan
• explanations about the design, the
implementation, and the tricks inside DL
• instructions for solving problems (we prefer
using DL)
• inspirations for novel thoughts and applications
58. Standpoint
• independent researcher and
practicers
• open to novel ideas
• focus on technology for
addressing humanity issues
• open to sponsors