This document summarizes a presentation on offline reinforcement learning. It discusses how offline RL can learn from fixed datasets without further interaction with the environment, which allows for fully off-policy learning. However, offline RL faces challenges from distribution shift between the behavior policy that generated the data and the learned target policy. The document reviews several offline policy evaluation, policy gradient, and deep deterministic policy gradient methods, and also discusses using uncertainty and constraints to address distribution shift in offline deep reinforcement learning.
Efficient Scalar Multiplication for Ate Based Pairing over KSS Curve of Embed...Md. Al-Amin Khandaker Nipu
Efficiency of the next generation pairing based security pro- tocols rely not only on the faster pairing calculation but also on efficient scalar multiplication on higher degree rational points. In this paper we proposed a scalar multiplication technique in the context of Ate based pairing with Kachisa-Schaefer-Scott (KSS) pairing friendly curves with embedding degree k = 18 at the 192-bit security level. From the system- atically obtained characteristics p, order r and Frobenious trace t of KSS curve, which is given by certain integer z also known as mother parame- ter, we exploit the relation #E(Fp) = p+1−t mod r by applying Frobe- nius mapping with rational point to enhance the scalar multiplication. In addition we proposed z-adic representation of scalar s. In combination of Frobenious mapping with multi-scalar multiplication technique we ef- ficiently calculate scalar multiplication by s. Our proposed method can achieve 3 times or more than 3 times faster scalar multiplication com- pared to binary scalar multiplication, sliding-window and non-adjacent form method.
All Pairs-Shortest Path (Fast Floyd-Warshall) Code Ehsan Sharifi
Shortest path algorithms are a family of algorithms designed to solve the shortest path problem. The shortest path problem is something most people have some intuitive familiarity with: given two points, A and B, what is the shortest path between them? In computer science, however, the all shortest path problem can take different forms and so different algorithms are needed to be able to solve them all. All shortest path, as an extension of single shortest path, has been investigated since the 60s, and plays a crucial role in many applications, including network optimization and routing, traffic information systems, databases, compilers, garbage collection, interactive verification systems, robotics, dataflow analysis, and document formatting.
In this project, we implement and evaluate a multi-core fast verison of Floyd-Warshall code.
Using blurred images to assess damage in bridge structures?Alessandro Palmeri
Faster trains and augmented traffic have significantly increased the number and amplitude of loading cycles experienced on a daily basis by composite steel-concrete bridges. This higher demand accelerates the occurrence of damage in the shear connectors between the two materials, which in turn can severely affect performance and reliability of these structures. The aim of this talk is to present the preliminary results of theoretical and experimental investigations undertaken to assess the feasibility of using the envelope of deflections and rotations induced by moving loads as a practical and cost-effective alternative to traditional methods of health monitoring for composite bridges. Both analytical and numerical formulations for this dynamic problem are presented and the results of a parametric study are discussed. A novel photogrammetric approach is also introduced, which allows identifying vibration patterns in civil engineering structures by analysing blurred targets in long-exposure digital images. The initial experimental validation of this approach is presented and further challenges are highlighted.
Using the work of Dr. Graham Hutton as our guide, we'll look at how to satisfy all of your list processing needs with one function, fold. First we'll start off simple by finding the length of a list, then we'll reverse a list, followed by and-ing and or-ing a list; all using fold. Next we'll look at implementing the higher order functions of: map, filter, and zip. Last we'll look at fold in action by using it on the Coin Changer kata. You'll never look at fold the same way again.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
3. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open
Problems
• Sergey Levine, Aviral Kumar, George Tucker, Justin Fu
• UC Google Brain
• https://arxiv.org/abs/2005.01643 (Submitted on 4 May 2020)
• Sergey Levine
• Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
• https://arxiv.org/abs/1805.00909
• RL
•
•
• RLArch 3
17. off-policy off-policy policy evaluation, OPE
• [Thomas+15]
•
•
• unbiased
•
self-normalize biased
πθ
J(πθ)
J (πθ) = 𝔼τ∼πβ(τ)
[
πθ(τ)
πβ(τ)
H
∑
t=0
γt
r(s, a)
]
= 𝔼τ∼πβ(τ)
(
H
∏
t=0
πθ (at ∣ st)
πβ (at ∣ st))
H
∑
t=0
γt
r(s, a) ≈
n
∑
i=1
wi
H
H
∑
t=0
γt
ri
t
wi
t =
H
∏
t=0
πθ (at ∣ st)
πβ (at ∣ st)
{(si
t, ai
t, ri
t, si
t+1, ⋯)}
n
i=1
πβ n
n
∑
i=1
wi
H
17
18. off-policy off-policy policy evaluation, OPE
•
•
•
• self-normalization
• Doubly robust estimator
•
• Q
• or unbiased
J (πθ) = 𝔼τ∼πβ(τ)
[
∑
H
t=0 (
∏
t
t′=0
πθ(at ∣ st)
πβ(at ∣ st) )
γt
r(s, a)
]
≈ 1
n
∑
n
i=1
∑
H
t=0
wi
tγt
ri
t
rt t′ > t st, at
J (πθ) ≈ ∑
n
i=1
∑
H
t=0
γt
(
wi
t (ri
t − ̂Qπθ
(st, at)) − wi
t−1 𝔼a∼πθ(a ∣ st) [
̂Qπθ
(st, a)])
(st, at) ̂Q(st, at)
πβ
18
19. off-policy off-policy policy gradient
•
•
• self-normalization
πθ ∇θJ(πθ)
∇θJ (πθ) = 𝔼τ∼πβ(τ)
[
πθ(τ)
πβ(τ)
H
∑
t=0
γt
∇θlog πθ (at ∣ st) ̂A (st, at)
]
= 𝔼τ∼πβ(τ)
(
H
∏
t=0
πθ (at ∣ st)
β (at ∣ st) )
H
∑
t=0
γt
∇θlog πθ (at ∣ st) ̂A (st, at)
≈
n
∑
i=1
wi
H
H
∑
t=0
γt
∇θlog πθ (ai
t ∣ si
t) ̂A (si
t, ai
t)
{(si
t, ai
t, ri
t, si
t+1, ⋯)}
n
i=1
πβ n
19
20. off-policy off-policy policy gradient
•
•
•
• Doubly robust estimator
• off-policy
• Softmax
•
•
∇θJ (πθ) ≈ ∑
n
i=1
∑
H
t=0
wi
tγt
(
∑
H
t′=t
γt′−t
wi
t′
wi
t
rt′ − b (si
t))
∇θlog πθ (ai
t ∣ si
t)
∇θ
¯J (πθ) ≈ (∑
n
i=1
wi
H ∑
H
t=0
γt
∇θlog πθ (ai
t ∣ si
t) ̂A (si
t, ai
t)) + λ log (∑
n
i=1
wi
H)
20
43. RL
•
• D4RL[Kumar+20] RL Unplugged [Gulcehre+20*]
•
• IRIS[Mandlekar+19*]
•
• off-policy RL
• Control as Inference RL
•
• BC AWR[Peng+19]
• GAIL[Ho+16*]
• T-REX[Brown+19a*] D-REX[Brown+19b*] 43
44. [Agarwal+19] Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. “An Optimistic Perspective on Offline Reinforcement Learning” https://arxiv.org/abs/
1907.04543
[Arjovsky+19] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz. "Invariant Risk Minimization”. https://arxiv.org/abs/1907.02893
[Battaglia+16] Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Rezende, Koray Kavukcuoglu. “Interaction Networks for Learning about Objects, Relations
and Physics”. https://arxiv.org/abs/1612.00222
[Brown+19a*] Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum. “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement
Learning from Observations”. https://arxiv.org/abs/1904.06387
[Brown+19b*] Daniel S. Brown, Wonjoon Goo, Scott Niekum. “Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations”. https://arxiv.org/
abs/1907.03976
[Deisenroth+11] Marc Deisenroth, Carl E. Rasmussen. "PILCO: A model-based and data-efficient approach to policy search." Proceedings of the 28th International
Conference on machine learning. 2011.
[Dosovitskiy+17] Alexey Dosovitskiy, Vladlen Koltun. “Learning to Act by Predicting the Future”. https://arxiv.org/abs/1611.01779
[Ebert+18] Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex Lee, Sergey Levine. “Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-
Based Robotic Control”. https://arxiv.org/abs/1812.00568
[Finn+17] Chelsea Finn, Sergey Levine. “Deep Visual Foresight for Planning Robot Motion”. https://arxiv.org/abs/1610.00696
[Fu+19] Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine. “Diagnosing bottlenecks in deep Q-learning algorithms”. https://arxiv.org/abs/1902.10250
[Fujimoto+18a] Scott Fujimoto, David Meger, Doina Precup. “Off-Policy Deep Reinforcement Learning without Exploration”. https://arxiv.org/abs/1812.02900
[Gal+16] Yarin Gal, Zoubin Ghahramani. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”. https://arxiv.org/abs/1506.02142
[Gelada+19] Carles Gelada, Marc G. Bellemare. “Off-policy deep reinforcement learning by bootstrapping the covariate shift”. https://arxiv.org/abs/1901.09455
[Gu+17] Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine. “Interpolated policy gradient: Merging on-policy
and off-policy gradient estimation for deep reinforcement learning”. https://arxiv.org/abs/1706.00387
[Gulcehre+20*] Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gómez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel
Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas. “RL
Unplugged: Benchmarks for Offline Reinforcement Learning”. https://arxiv.org/abs/2006.13888
[Ho+16*] Jonathan Ho, Stefano Ermon. “Generative Adversarial Imitation Learning”. https://arxiv.org/abs/1606.03476
[Imani+18] Ehsan Imani, Eric Graves, Martha White. “An Off-policy Policy Gradient Theorem Using Emphatic Weightings”. https://arxiv.org/abs/1811.09013 44
45. [Jaques+19] Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard. “Way Off-Policy
Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”. https://arxiv.org/abs/1907.00456
[Kahn+20] Gregory Kahn, Pieter Abbeel, Sergey Levine. “BADGR: An Autonomous Self-Supervised Learning-Based Navigation System”. https://arxiv.org/abs/2002.05700
[Kalashnikov+18] Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent
Vanhoucke, Sergey Levine. “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”. https://arxiv.org/abs/1806.10293
[Kallus+19] Nathan Kallus, Masatoshi Uehara. “Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning”. https://arxiv.org/
abs/1909.05850
[Kidambi+20*] Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims. “MOReL : Model-Based Offline Reinforcement Learning”. https://arxiv.org/
abs/2005.05951
[Kingma+14] Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling. “Semi-Supervised Learning with Deep Generative Models” https://arxiv.org/abs/
1406.5298
[Kumar+19] Aviral Kumar, Justin Fu, George Tucker, Sergey Levine. “Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction”. https://arxiv.org/abs/1906.00949
[Kumar+20] Does on-policy data collection fix errors in reinforcement learning? https://bair.berkeley.edu/blog/2020/03/16/discor/.
[Lange+12] Sascha Lange, Thomas Gabel, Martin Riedmiller. “Batch reinforcement learning”. Reinforcement learning. Springer, 2012. 45-73.
[Liu+18] Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou. “Breaking the curse of horizon: Infinite-horizon off-policy estimation”. https://arxiv.org/abs/1810.12429
[Mandlekar+19*] Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox. “IRIS: Implicit Reinforcement without Interaction at
Scale for Learning Control from Offline Robot Manipulation Data”. https://arxiv.org/abs/1911.05321
[Matsushima+20*] Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu. “Deployment-Efficient Reinforcement Learning via Model-Based Offline
Optimization”. https://arxiv.org/abs/2006.03647
[Mousavi+20] Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou. “Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning”. https://arxiv.org/abs/
2003.11126
[Nachum+19a] Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li. “DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections”. https://
arxiv.org/abs/1906.04733
[Nachum+19b] Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans. “AlgaeDICE: Policy Gradient from Arbitrary Experience”. https://arxiv.org/
abs/1912.02074
[Nachum+20] Ofir Nachum, Bo Dai. “Reinforcement Learning via Fenchel-Rockafellar Duality”. https://arxiv.org/abs/2001.01866
45
46. [Oh+17] Junhyuk Oh, Satinder Singh, Honglak Lee. “Value Prediction Network”. https://arxiv.org/abs/1707.03497
[Parr+08] Parr, Ronald, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman. “An analysis of linear models, linear value-function approximation, and
feature selection for reinforcement learning”. In Proceedings of the 25th international conference on Machine learning. 2008.
[Parmas+19] Paavo Parmas, Carl Edward Rasmussen, Jan Peters, Kenji Doya. “PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos”. https://
arxiv.org/abs/1902.01240
[Peng+19] Xue Bin Peng, Aviral Kumar, Grace Zhang, Sergey Levine. "Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning”. https://
arxiv.org/abs/1910.00177
[Schölkopf19] Bernhard Schölkopf. “Causality for machine learning”. https://arxiv.org/abs/1911.10500
[Siegel+20] Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin
Riedmiller. “Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning”. https://arxiv.org/abs/2002.08396
[Sinha+17] Aman Sinha, Hongseok Namkoong, Riccardo Volpi, John Duchi. “Certifying Some Distributional Robustness with Principled Adversarial Training”. https://
arxiv.org/abs/1710.10571
[Silver+14] David Silver. Guy Lever. Nicolas Heess. Thomas Degris. Daan Wierstra. Martin Riedmiller. ”Deterministic policy gradient algorithms”. Proceedings of the 31st
International Conference on International Conference on Machine Learning. 2014.
[Sutton+91] Dyna, an integrated architecture for learning, planning, and reacting”. ACM Sigart Bulletin, 2(4), 160-163. 1991.
[Tassa+12] Yuval Tassa, Tom Erez, Emanuel Todorov. "Synthesis and stabilization of complex behaviors through online trajectory optimization." 2012 IEEE/RSJ
International Conference on Intelligent Robots and Systems. 2012.
[Tang+19] Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu. “Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation”. https://arxiv.org/abs/
1910.07186
[Thomas+15] Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh. "High-confidence off-policy evaluation”. Twenty-Ninth AAAI Conference on Artificial
Intelligence. 2015.
[Uehara+19] Masatoshi Uehara, Jiawei Huang, Nan Jiang. “Minimax Weight and Q-Function Learning for Off-Policy Evaluation”, https://arxiv.org/abs/1910.12809
[Vanseijen+15] Harm Vanseijen, Rich Sutton. "A deeper look at planning as learning from replay." In International conference on machine learning. 2015.
[Wu+19] Yifan Wu, George Tucker, Ofir Nachum. “Behavior Regularized Offline Reinforcement Learning”. https://arxiv.org/abs/1911.11361
[Yu+20*] Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma. “MOPO: Model-based Offline Policy Optimization”.
https://arxiv.org/abs/2005.13239
[Zhang+20] Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans. “GenDICE: Generalized Offline Estimation of Stationary Values”. https://arxiv.org/abs/2002.09072 46