An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
Reinforcement Learning (RL) is a genre of Machine Learning in which an agent learns to choose optimal actions in different states in order to reach its specified goal, solely by interacting with the environment through trial and error. Unlike supervised learning, the agent does not get examples of "correct" actions in given states as ground truth. Instead, it has to use feedback from the environment (which can be sparse and delayed) to improve its policy over time. The formulation of the RL problem closely resembles the way in which human beings learn to act in different situations. Hence it is often considered the gateway to achieving the goal of Artificial General Intelligence.
The motivation of this talk is to introduce the audience to key theoretical concepts like formulation of the RL problem using Markov Decision Process (MDP) and solution of MDP using dynamic programming and policy gradient based algorithms. State-of-the-art deep reinforcement learning algorithms will also be covered. A case study of the application of reinforcement learning in robotics will also be presented.
Exploration Strategies in Reinforcement LearningDongmin Lee
I presented about "Exploration Strategies in Reinforcement Learning" at AI Robotics KR.
- Exploration strategies in RL
1. Epsilon-greedy
2. Optimism in the face of uncertainty
3. Thompson (posterior) sampling
4. Information theoretic exploration (e.g., Entropy Regularization in RL)
Thank you.
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Chris Ohk
RL 논문 리뷰 스터디에서 Adversarially Guided Actor-Critic 논문 내용을 정리해 발표했습니다. AGAC는 Actor-Critic에 GAN에서 영감을 받은 방법들을 결합해 리워드가 희소하고 탐험이 어려운 환경에서 뛰어난 성능을 보여줍니다. 많은 분들에게 도움이 되었으면 합니다.
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
RL 논문 리뷰 스터디에서 Evolving Reinforcement Learning Algorithms 논문 내용을 정리해 발표했습니다. 이 논문은 Value-based Model-free RL 에이전트의 손실 함수를 표현하는 언어를 설계하고 기존 DQN보다 최적화된 손실 함수를 제안합니다. 많은 분들에게 도움이 되었으면 합니다.
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
Reinforcement Learning (RL) is a genre of Machine Learning in which an agent learns to choose optimal actions in different states in order to reach its specified goal, solely by interacting with the environment through trial and error. Unlike supervised learning, the agent does not get examples of "correct" actions in given states as ground truth. Instead, it has to use feedback from the environment (which can be sparse and delayed) to improve its policy over time. The formulation of the RL problem closely resembles the way in which human beings learn to act in different situations. Hence it is often considered the gateway to achieving the goal of Artificial General Intelligence.
The motivation of this talk is to introduce the audience to key theoretical concepts like formulation of the RL problem using Markov Decision Process (MDP) and solution of MDP using dynamic programming and policy gradient based algorithms. State-of-the-art deep reinforcement learning algorithms will also be covered. A case study of the application of reinforcement learning in robotics will also be presented.
Exploration Strategies in Reinforcement LearningDongmin Lee
I presented about "Exploration Strategies in Reinforcement Learning" at AI Robotics KR.
- Exploration strategies in RL
1. Epsilon-greedy
2. Optimism in the face of uncertainty
3. Thompson (posterior) sampling
4. Information theoretic exploration (e.g., Entropy Regularization in RL)
Thank you.
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Chris Ohk
RL 논문 리뷰 스터디에서 Adversarially Guided Actor-Critic 논문 내용을 정리해 발표했습니다. AGAC는 Actor-Critic에 GAN에서 영감을 받은 방법들을 결합해 리워드가 희소하고 탐험이 어려운 환경에서 뛰어난 성능을 보여줍니다. 많은 분들에게 도움이 되었으면 합니다.
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
RL 논문 리뷰 스터디에서 Evolving Reinforcement Learning Algorithms 논문 내용을 정리해 발표했습니다. 이 논문은 Value-based Model-free RL 에이전트의 손실 함수를 표현하는 언어를 설계하고 기존 DQN보다 최적화된 손실 함수를 제안합니다. 많은 분들에게 도움이 되었으면 합니다.
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
I reviewed the PEARL paper.
PEARL (Probabilistic Embeddings for Actor-critic RL) is an off-policy meta-RL algorithm to achieve both meta-training and adaptation efficiency. It performs probabilistic encoder filtering of latent task variables to enables posterior sampling for structured and efficient exploration.
Outline
- Abstract
- Introduction
- Probabilistic Latent Context
- Off-Policy Meta-Reinforcement Learning
- Experiments
Link: https://arxiv.org/abs/1903.08254
Thank you!
An overview of gradient descent optimization algorithms Hakky St
勾配降下法についての論文をスライドにしたものです。
This is the slide for study meeting of gradient descent.
I use this paper and this is very good information about gradient descent.
https://arxiv.org/abs/1609.04747
Dexterous In-hand Manipulation by OpenAIAnand Joshi
OpenAI has used Reinforcement Learning to train a humanoid robotic hand to rotate a cube to achieve any desired orientation. This is discussed in arXiv:1808.00177, 2019 and in the blog <openai.com/blog/learning dexterity/>. These slides present results from the paper along with a few important concepts in reinforcement learning I learnt through many other sources.
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly.
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee
Presentation slides for 'Learning Dexterous In-Hand Manipulation' by OpenAI.
You can find more presentation slides in my website:
https://www.endtoend.ai
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...Dongmin Lee
I reviewed the PRM-RL paper.
PRM-RL (Probabilistic Roadmap-Reinforcement Learning) is a hierarchical method that combines sampling-based path planning with RL. It uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces. In experiment, authors evaluate PRM- RL, both in simulation and on-robot, on two navigation tasks: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments.
Outline
- Abstract
- Introduction
- Reinforcement Learning
- Methods
- Results
Thank you.
Character Controllers using Motion VAEsDongmin Lee
Title: Character Controllers using Motion VAEs
Proceeding: ACM Transactions on Graphics (TOG) (Proc. SIGGRAPH 2020)
Paper: https://dl.acm.org/doi/abs/10.1145/3386569.3392422
Video: https://www.youtube.com/watch?v=Zm3G9oqmQ4Y
Given example motions, how can we generalize these to produce new purposeful motions?
We take a two-step approach to this problem
• Kinematic generative model based on an autoregressive conditional variational autoencoder or motion VAE (MVAE)
• Learning controller to generate desired motions using Deep Reinforcement Learning (Deep RL)
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
Industry day keynote presentation held at ECIR 2016, Padova. The talk presents algorithmic, technical and business challenges Gravity R&D encountered from building a recommender system vendor company from being a top Netflix Prize contender.
Horizon: Deep Reinforcement Learning at ScaleDatabricks
To build a decision-making system, we must provide answers to two sets of questions: (1) ""What will happen if I make decision X?"" and (2) ""How should I pick which decision to make?"".
Typically, the first set of questions are answered with supervised learning: we build models to forecast whether someone will click on an ad, or visit a post. The second set of questions are more open-ended. In this talk, we will dive into how we can answer ""how"" questions, starting with heuristics and search. This will lead us to bandits, reinforcement learning, and Horizon: an open-source platform for training and deploying reinforcement learning models at massive scale. At Facebook, we are using Horizon, built using PyTorch 1.0 and Apache Spark, in a variety of AI-related and control tasks, spanning recommender systems, marketing & promotion distribution, and bandwidth optimization.
The talk will cover the key components of Horizon and the lessons we learned along the way that influenced the development of the platform.
Author: Jason Gauci
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
The material that I've used to present the paper
"Continuous Deep Q-Learning with Model-based Acceleration", S.Gu, T.Lillicrap, I.Sutskever, S.Levine, 2016 ICML
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...SmartCat
Reinforcement learning in today's order of things when artificial intelligence is on the rise is a favorable field for new research. One of the problems that were tried to be solved in the last year or two is the problem of environmental control or navigation. This talk is going to present one form of solution to the problem of navigation and generalization in a three-dimensional environment while there are restrictions of rewards, by forming an autonomous agent with deep learning techniques.
Deep Learning in Robotics
- There are two major branches in applying deep learning techniques in robotics.
- One is to combine DL with Q learning algorithms. For example, awesome work on playing Atari games done by deep mind is a representative study. While this approach can effectively handle several problems that can hardly be solved via traditional methods, these methods are not appropriate for real manipulators as it often requires an enormous number of training data.
- The other branch of work uses a concept of guided policy search. It combines trajectory optimization methods with supervised learning algorithm like CNNs to come up with a robust 'policy' function that can actually be used in real robots, e.g., Baxter of PR2.
Making smart decisions in real-time with Reinforcement LearningRuth Yakubu
The process of reinforcement learning (RL) involves trial and error; rewarding actions; and remembering past experiences overtime. This technique is used when building sequential decision-making solutions like automated self-driving cars, video games or personalized content recommendations. However, some of the challenges in building reinforcement learning models is it takes a long time for the system to learn and getting a high accuracy. In this session, we'll explore different reinforcement learning solutions like how to implement relevant user experiences that improve over time, based on behavior using a pre-built API; and how to build your custom model from scratch in python while increasing the learning speed and final performance using Azure Machine Learning & Ray/RLlib
Lecture slides in DASI spring 2018, National Cheng Kung University, Taiwan. The content is about deep reinforcement learning: policy gradient including variance reduction and importance sampling
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
I reviewed the PEARL paper.
PEARL (Probabilistic Embeddings for Actor-critic RL) is an off-policy meta-RL algorithm to achieve both meta-training and adaptation efficiency. It performs probabilistic encoder filtering of latent task variables to enables posterior sampling for structured and efficient exploration.
Outline
- Abstract
- Introduction
- Probabilistic Latent Context
- Off-Policy Meta-Reinforcement Learning
- Experiments
Link: https://arxiv.org/abs/1903.08254
Thank you!
An overview of gradient descent optimization algorithms Hakky St
勾配降下法についての論文をスライドにしたものです。
This is the slide for study meeting of gradient descent.
I use this paper and this is very good information about gradient descent.
https://arxiv.org/abs/1609.04747
Dexterous In-hand Manipulation by OpenAIAnand Joshi
OpenAI has used Reinforcement Learning to train a humanoid robotic hand to rotate a cube to achieve any desired orientation. This is discussed in arXiv:1808.00177, 2019 and in the blog <openai.com/blog/learning dexterity/>. These slides present results from the paper along with a few important concepts in reinforcement learning I learnt through many other sources.
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly.
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee
Presentation slides for 'Learning Dexterous In-Hand Manipulation' by OpenAI.
You can find more presentation slides in my website:
https://www.endtoend.ai
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...Dongmin Lee
I reviewed the PRM-RL paper.
PRM-RL (Probabilistic Roadmap-Reinforcement Learning) is a hierarchical method that combines sampling-based path planning with RL. It uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces. In experiment, authors evaluate PRM- RL, both in simulation and on-robot, on two navigation tasks: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments.
Outline
- Abstract
- Introduction
- Reinforcement Learning
- Methods
- Results
Thank you.
Character Controllers using Motion VAEsDongmin Lee
Title: Character Controllers using Motion VAEs
Proceeding: ACM Transactions on Graphics (TOG) (Proc. SIGGRAPH 2020)
Paper: https://dl.acm.org/doi/abs/10.1145/3386569.3392422
Video: https://www.youtube.com/watch?v=Zm3G9oqmQ4Y
Given example motions, how can we generalize these to produce new purposeful motions?
We take a two-step approach to this problem
• Kinematic generative model based on an autoregressive conditional variational autoencoder or motion VAE (MVAE)
• Learning controller to generate desired motions using Deep Reinforcement Learning (Deep RL)
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
Industry day keynote presentation held at ECIR 2016, Padova. The talk presents algorithmic, technical and business challenges Gravity R&D encountered from building a recommender system vendor company from being a top Netflix Prize contender.
Horizon: Deep Reinforcement Learning at ScaleDatabricks
To build a decision-making system, we must provide answers to two sets of questions: (1) ""What will happen if I make decision X?"" and (2) ""How should I pick which decision to make?"".
Typically, the first set of questions are answered with supervised learning: we build models to forecast whether someone will click on an ad, or visit a post. The second set of questions are more open-ended. In this talk, we will dive into how we can answer ""how"" questions, starting with heuristics and search. This will lead us to bandits, reinforcement learning, and Horizon: an open-source platform for training and deploying reinforcement learning models at massive scale. At Facebook, we are using Horizon, built using PyTorch 1.0 and Apache Spark, in a variety of AI-related and control tasks, spanning recommender systems, marketing & promotion distribution, and bandwidth optimization.
The talk will cover the key components of Horizon and the lessons we learned along the way that influenced the development of the platform.
Author: Jason Gauci
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
The material that I've used to present the paper
"Continuous Deep Q-Learning with Model-based Acceleration", S.Gu, T.Lillicrap, I.Sutskever, S.Levine, 2016 ICML
Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...SmartCat
Reinforcement learning in today's order of things when artificial intelligence is on the rise is a favorable field for new research. One of the problems that were tried to be solved in the last year or two is the problem of environmental control or navigation. This talk is going to present one form of solution to the problem of navigation and generalization in a three-dimensional environment while there are restrictions of rewards, by forming an autonomous agent with deep learning techniques.
Deep Learning in Robotics
- There are two major branches in applying deep learning techniques in robotics.
- One is to combine DL with Q learning algorithms. For example, awesome work on playing Atari games done by deep mind is a representative study. While this approach can effectively handle several problems that can hardly be solved via traditional methods, these methods are not appropriate for real manipulators as it often requires an enormous number of training data.
- The other branch of work uses a concept of guided policy search. It combines trajectory optimization methods with supervised learning algorithm like CNNs to come up with a robust 'policy' function that can actually be used in real robots, e.g., Baxter of PR2.
Making smart decisions in real-time with Reinforcement LearningRuth Yakubu
The process of reinforcement learning (RL) involves trial and error; rewarding actions; and remembering past experiences overtime. This technique is used when building sequential decision-making solutions like automated self-driving cars, video games or personalized content recommendations. However, some of the challenges in building reinforcement learning models is it takes a long time for the system to learn and getting a high accuracy. In this session, we'll explore different reinforcement learning solutions like how to implement relevant user experiences that improve over time, based on behavior using a pre-built API; and how to build your custom model from scratch in python while increasing the learning speed and final performance using Azure Machine Learning & Ray/RLlib
Lecture slides in DASI spring 2018, National Cheng Kung University, Taiwan. The content is about deep reinforcement learning: policy gradient including variance reduction and importance sampling
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
My Academic Major Project Movie Recommendation using Artificial Intelligence. We also developed a website named movie engine for the recommendation of movies.
Mobile Recommendation Engine
collaborative filtering and content based approach in hybrid manner then Genetic Algorithm for Enhancement of the Recommendation Engine. by this marketers also will get the unique characteristics of the product that must be created and also recommend to the user.
This presentation is for introducing google DeepMind's DeepDPG algorithm to my colleagues.
I tried my best to make it easy to be understood...
Comment is always welcome :)
hiddenmaze91.blogspot.com
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, it is possible to propose a framework for autonomous driving using deep reinforcement learning.
It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios.
Reinforcement Learning Guide For Beginnersgokulprasath06
Reinforcement Learning Guide:
Land in multiple job interviews by joining our Data Science certification course.
Data Science course content designed uniquely, which helps you start learning Data Science from basics to advanced data science concepts.
Content: http://bit.ly/2Mub6xP
Any Queries, Call us@ +91 9884412301 / 9600112302
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
3. Motivation
Credits: YouTube
Credits: Prof Jeff Schneider – RI Seminar Talk
Goal – To make self driving …
• Scalable to new domains.
• Robust to rare long tail events.
• Have verifiable performance through simulation.
4. Motivation
• A good policy exists!
Credits: Chen et al.,“Learning by cheating”
(https://arxiv.org/pdf/1912.12294.pdf)
5. Motivation
• A good policy exists!
• RL should in theory
outperform imitation
learning.
Credits: OpenAI Five
(Berner et al., “Dota 2 with Large Scale Deep Reinforcement Learning”)
6. Motivation
• Given a good policy, it can be
optimized further every time a
safety driver intervenes.
• RL could, in theory, outperform
human performance.
Credits: Wayve
8. Types of RL algorithms
• On Policy Algorithms
• Uses actions from current policy to
obtain training data and updates
values.
• Off Policy Algorithms
• Uses actions from a separate
“behavior” policy to obtain training
data and updates the values.
9. Brief Recap of RL
• Reward - R(s,a)
• State Value Function - V(s)
• State-Action Value Function - Q(s,a)
• Discount Factor -
• Tabular Q Learning
10. Deep Q-Networks (DQN)
• First to use deep neural networks for learning Q functions [1]
• Main contributions:
• Uses target networks
• Uses replay buffer
1] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
• Cons:
• Maximization bias
• Pros:
• Off policy – Sample efficient
11. Policy Gradients
• Why policy gradients?
• Direct method to compute optimal policy
• Parametrize policies and optimize using loss functions[1]
• Advantageous in large/continuous action domains
[1] Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
State
distribution
State-action
value function
Gradient of
Policy function
12. Trust Region Policy Optimization
• Pros
• Introduced the idea that a large
shift in policy is bad!
• Thus, reduces sample
complexity.
• Cons
• It is an on-policy algorithm.
Schulman, John, et al. "Trust region policy optimization." International conference on
machine learning. 2015.
13. Proximal Policy Algorithm
𝐴 𝑡 is functionally the same as Q within the expectation
• PPO was an improvement on
TRPO.
• We can rearrange the hard KL
constraint into the softer loss
described here.
• But, their main contribution is…
Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
14. Proximal Policy Algorithm
• The clip loss function!
• They clip the loss value instead
of a KL constraint
• Good actions will not be too
beneficial, but any bad actions
will have a minimum penalty.
Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
16. Actor Critic Algorithms
• What if the gradient estimator in policy gradients has too
much variance?
• What does that mean?
• It takes too many interactions with environment to learn the optimal
policy parameters
17. Actor Critic Algorithms
• Turns out that we can control this variance using value functions.
• If we have some information about the current state, gradient estimation can
be better.
• Actor
• Policy network
• Critic
• Value function
18. Soft Actor Critic
• Uses Maximum Entropy
RL framework [1]
• Uses clipped double-Q trick to
avoid maximization bias
[1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).
Credits: BAIR
19. Soft Actor Critic
• Advantages :
• Off Policy algorithm
• Exploration is inherently handled
[1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).
Credits: BAIR
21. [1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).
22. Experimental Setting (Past Work)
• State Space
• Semantically segmented bird eye view images
• An autoencoder is then trained on them.
• Waypoints!
• Action Space
• Speed - Continuous, Controlled using a simulated PID
• Steer Angle – Continuous
Credits: Learning to Drive using Waypoints
(Tanmay Agarwal, Hitesh Arora, Tanvir Parhar et al.)
23. Experimental Setting (Past Work)
• Inputs include waypoint features
as route to follow
• Uses the CARLA Simulator
Credits: Learning to Drive using Waypoints
(Tanmay Agarwal, Hitesh Arora, Tanvir Parhar et al.)
24. Experimental Setting (Past Work)
• Rewards
• (Speed Reward)
• Assuming that we are following waypoints, this is distance to goal
• (Deviation Penalty)
• Penalize if we are deviating from the trajectory/ waypoints
• (Collision Penalty)
• Avoid collisions. Even if we are going to collide, collide with low speed
Credits: Learning to Drive using Waypoints
(Tanmay Agarwal, Hitesh Arora, Tanvir Parhar et al.)
25. Complete Pipeline
(Past Work)
• AE Network for state
representation
• Shallow policy network
Credits: Learning to Drive using Waypoints
(Tanmay Agarwal, Hitesh Arora, Tanvir Parhar et al.)
26. Past Work
Good Navigation in
Empty Lanes
Crashes with
stationary cars
Credits: Learning to Drive using Waypoints
(Tanmay Agarwal, Hitesh Arora, Tanvir Parhar et al.)
• Uses PPO at the
moment
• DQN is being
tried
• We want to use
SAC for this task
27. Future Work
• Next steps -
• To focus on settings with dynamic actors.
• Improve exploration on current settings using SAC.
• Training in dense environments, possibly also through self play RL.
28. Thank you
Hit us with questions!
We’d appreciate any useful suggestions.
Editor's Notes
NOTES :
Self driving cars today process sensor inputs through tens of learned systems.
NEXT
Existing approach works well and will result in fully autonomous cars eventually, at least in specific domains. But we need a less engineer intensive version of the current approach.
Less engineering heavy – Tail scenarios. Also, transferability across cities.
Perception, Prediction, Mapping and Localization, Planning – All include several sub systems that do various tasks.
Assuming Perception is solved we want to use RL to remove other components.
NOTES :
Self driving cars today process sensor inputs through tens of learned systems.
NEXT
Existing approach works well and will result in fully autonomous cars eventually, at least in specific domains. But we need a less engineer intensive version of the current approach.
Less engineering heavy – Tail scenarios. Also, transferability across cities.
Perception, Prediction, Mapping and Localization, Planning – All include several sub systems that do various tasks.
Assuming Perception is solved we want to use RL to remove other components.
Learning by cheating achieved 100% performance on Carla’s benchmark recently.
This shows that a good policy exists.
CARLA
NEXT - How it works
Needs expert driving trajectories
RL has repeatedly shown itself to be capable of outperforming humans on highly complex tasks with large branching factors
Branching factor is 10^4. Chess has 35 and Go has 250.
However, transferring this performance to the real world in a noisy environment is a big challenge for RL.
This is the first car to learn self driving using reinforcement learning.
Explain Points on Screen.
Model RL methods can have error propagation. It is difficult to fit a model to the real world, unlike in Chess or AlphaGo.
So, we want Model free RL.
Two major classes, either optimize policy or value function, or have a method that uses both.
TO EXPLAIN
Off Policy vs On Policy methods
Off policy - Advantages of replay buffers in highly correlated data
Off policy methods are more sample efficient as they can reuse past experiences for training later on
Important experiences can be saved and reused later for training
In Green – Off Policy, In Purple – On Policy
As we learned in 10-601
As we learned in 10-601..
1) Tabular Q learning works well when the state space is finite. But as state space grows larger, we need to turn to function approximation which takes state and action as input and return Q value.
2) We update the parameters until we get all the Q values of state action pairs correct
3) But the update equation assumes iid. But in RL tasks, the states are correlated. One of the major contribution is that they use replay buffer to break the correlations between samples.
4) Also, the target changes(non stationary) here which causes instability in learning. Targets networks hold the parameters fixed and avoid the changing targets problem.
5) But still one problem remains. The target is estimated. What if the target is wrong? It leads to maximization bias
1) Q learning is good for discrete actions. But what if the actions are large/countinous?
2) Policy gradients is an alternative to Q learning. In Q learning, we first fit a Q function and then learn policy. Policy grad directly learns policies by parametrizing it and updating the parameters according to a loss function.
3) We don't want to go too much into the math, so the final gradient of loss w.r.t policy parameters looks like …..
4) q_pi is estimated from experience
5) What happens is that we start with some params, and state, take an action, collect reward and next state and update …(interaction with environment)
TO EXPLAIN
Takes optimal steps compared to Policy Gradients
Maximizes Expected Value due to NEW POLICY but old value function. Denominator is due to importance sampling.
q(a|s) as the old policy.
BUT we do it with a constraint.
Forms a “TRUST REGION” with the help of the KL Divergence.
We can rearrange the constraint to form a loss.
A_t is like Q(s,a) – V(s).
But their main objective is L CLIP
R is the ratio in the above function.
On left, it doesn’t get too confident on it’s update and thus clips towards top the loss
On right, if policy become worse, it reverts changes and then proportionately more so if loss is even worse.
We can rearrange the constraint to form a loss.
A_t is like Q(s,a) – V(s).
But their main objective is L CLIP
R is the ratio in the above function.
On left, it doesn’t get too confident on it’s update and thus clips towards top the loss
On right, if policy become worse, it reverts changes and then proportionately more so if loss is even worse.
We can rearrange the constraint to form a loss.
A_t is like Q(s,a) – V(s).
But their main objective is L CLIP
R is the ratio in the above function.
On left, it doesn’t get too confident on it’s update and thus clips towards top the loss
On right, if policy become worse, it reverts changes and then proportionately more so if loss is even worse.
We want to learn optimal policy using minimum number of interactions with the environment.
Furthermore, as the policy changes, a new gradient is estimated independently of past estimates.
--> The basic idea is that if we know about state, variance can be less
1) Actor used critic to update itself
2) Critic improves itself to catchup with the changing policy
These keep on going and they complement each other until they converge
1) Maximum entropy framework – balance between exploring and collecting rewards, need to change the value funciton definition
2) Improves critic part by using clipped double Q trick
CAR EXAMPLE
1) SAC belongs to the class of Actor-critic algorithms
2) Before SAC, the major effort to reduce sample complexity is by DDPG but it is brittle and uses deterministic policies
3) Maximum entropy framework – balance between exploring and collecting rewards, need to change the value funciton definition
4) Improves critic part by using clipped double Q trick
1) SAC belongs to the class of Actor-critic algorithms
2) Before SAC, the major effort to reduce sample complexity is by DDPG but it is brittle and uses deterministic policies
3) Maximum entropy framework – balance between exploring and collecting rewards, need to change the value funciton definition
4) Improves critic part by using clipped double Q trick
NOTES: Experiment Setting - Problem Statement
State Space (AE on Semantically Segmented images generated by CARLA)
Action Space (Speed and Steer)
NOTES: Experiment Setting - Problem Statement
State Space (AE on Semantically Segmented images generated by CARLA)
Action Space (Speed and Steer)
Rewards (Ask Hitesh)
Training and Testing Towns
4 Test Scenarios - Each has several test cases
Photos for everything
NOTES: Experiment Setting - Problem Statement
State Space (AE on Semantically Segmented images generated by CARLA)
Action Space (Speed and Steer)
Rewards (Ask Hitesh)
Training and Testing Towns
4 Test Scenarios - Each has several test cases
Photos for everything
To Explain
Policy Input could include current speed and steer
Encoder decoder could use stack of frames