This presentation discusses using reinforcement learning to teach modular robots locomotion. The key challenges are the high-dimensional state and action spaces, as well as the lack of domain knowledge. The presentation proposes using policy gradient reinforcement learning with finite differences to learn locomotion policies from raw sensor data. It suggests that incorporating domain knowledge through task manifolds and curriculum learning could help address the "curse of dimensionality" and speed up the learning process. The goals are to apply these techniques to learn locomotion, map tasks to policies, and develop a "robot school" curriculum.
Modern applications are more and more moving away from static forms. In a modern interface, widgets and graphical elements are animated and transitioned smoothly. Those effects make your application look good but their main purpose should be to help end-users find out what's going on in their applications. In upcoming Qt versions we’re adding a new Animation Framework that relies upon Qt’s QObject-based architecture.
Presentation by Thierry Bastian held during Qt Developer Days 2009.
http://qt.nokia.com/developer/learning/elearning
As a powerful framework, Qt offers tons of modules and classes for building your applications. This talk highlight few practical cross-platform examples of what Qt can do with a fairly few lines of code, ranging from kinetic scrolling, weather service, OpenStreetMap, parallax effect, flight tracking, WYSIWYG HTML editor, and many more. All examples will be accompanied with corresponding live demos.
Presentation by Ariya Hidayat held during the Maemo Summit 2009 in Amsterdam
Graphics View becomes one of the prominent features of Qt these days, it also serves as the backbone for next-generation user-interface developments. This talk highlights several tips and tricks which you can employ to beautify your Graphics View-based application, in order to have much more exciting and interesting user interactions. In addition, a new addition in Qt 4.6, namely the graphics effect feature, will be introduced and demonstrated.
Presentation by Ariya Hidayat held during Qt Developer Days 2009.
http://qt.nokia.com/developer/learning/elearning
With the introduction of multi-touch and gesture support coming in Qt, application developers now have the possibility of introducing new types of input and interaction to their applications. We will examine the various types of devices that one can expect to encounter in the multi-touch and gesture "world". This presentation will also introduce and explain the new API, walk through some example code, and show some demos of some of the possibilities we foresee with this technology.
Presentation by Jens Bache-Wiig held during Qt Developer Days 2009.
http://qt.nokia.com/developer/learning/elearning
Presentation on
S.M. LaValle and J.J Kuffner. Rapidly-exploring random trees: Progress and prospects. In Robotics: The Algorithmic Perspective. 4th Int. Workshop on the Algorithmic Foundations of Robotics., Hanover, NH, 2000. A. K. Peters.
Modern applications are more and more moving away from static forms. In a modern interface, widgets and graphical elements are animated and transitioned smoothly. Those effects make your application look good but their main purpose should be to help end-users find out what's going on in their applications. In upcoming Qt versions we’re adding a new Animation Framework that relies upon Qt’s QObject-based architecture.
Presentation by Thierry Bastian held during Qt Developer Days 2009.
http://qt.nokia.com/developer/learning/elearning
As a powerful framework, Qt offers tons of modules and classes for building your applications. This talk highlight few practical cross-platform examples of what Qt can do with a fairly few lines of code, ranging from kinetic scrolling, weather service, OpenStreetMap, parallax effect, flight tracking, WYSIWYG HTML editor, and many more. All examples will be accompanied with corresponding live demos.
Presentation by Ariya Hidayat held during the Maemo Summit 2009 in Amsterdam
Graphics View becomes one of the prominent features of Qt these days, it also serves as the backbone for next-generation user-interface developments. This talk highlights several tips and tricks which you can employ to beautify your Graphics View-based application, in order to have much more exciting and interesting user interactions. In addition, a new addition in Qt 4.6, namely the graphics effect feature, will be introduced and demonstrated.
Presentation by Ariya Hidayat held during Qt Developer Days 2009.
http://qt.nokia.com/developer/learning/elearning
With the introduction of multi-touch and gesture support coming in Qt, application developers now have the possibility of introducing new types of input and interaction to their applications. We will examine the various types of devices that one can expect to encounter in the multi-touch and gesture "world". This presentation will also introduce and explain the new API, walk through some example code, and show some demos of some of the possibilities we foresee with this technology.
Presentation by Jens Bache-Wiig held during Qt Developer Days 2009.
http://qt.nokia.com/developer/learning/elearning
Presentation on
S.M. LaValle and J.J Kuffner. Rapidly-exploring random trees: Progress and prospects. In Robotics: The Algorithmic Perspective. 4th Int. Workshop on the Algorithmic Foundations of Robotics., Hanover, NH, 2000. A. K. Peters.
Introduction to Steering behaviours for Autonomous AgentsBryan Duggan
Steering behaviours are simple techniques for controlling
goal-directed motion of simulated characters around their world, with
applications in games, animation and robotics.
These behaviours are largely independent of each other and can be combined together to implement actions such as "go from this part of world to another part of the world, avoiding any obstacles that happen to be in the way".
Steering behaviours are used to simulate natural phenomena such as
shoals of fish, flocks of birds and crowd scenes.
Presented at JavaOne 2017 [CON4027], this presentation takes a practical, hands-on look at Java performance tuning. It discusses methodology (spoiler: it’s the scientific method) and how to apply it to Java SE systems (on any budget). Exploring concrete examples with tools such as the Oracle Java Mission Control feature of Oracle Java SE Advanced, VisualVM, YourKit, and JMH, the presentation focuses on ways of measuring performance, how to interpret data, ways of eliminating bottlenecks, and even how to avoid future performance regressions.
A separate version will be uploaded with speaker notes.
Presentation Video : http://tinyurl.com/pfhz96m
Stage 3D introduction in Adobe Flash Player and Adobe AIR lets you use techniques such as deferred lighting, screen space dynamic shadow, MRT, and more through vertex and fragment shaders. Join Jean-Philippe Doiron, Principal Architect R&D at Frima Studio, and Jean-Philippe Auclair, R&D Architect, for a deep dive into GPU programming with the new Flash Player, and discover how to produce beautiful GPU effects that are reusable in your games and applications.
OpenGL - point & line design
introduce the construction of displayers (CRT, Flat-panel, LCD, PDP, projector...)
those render is based on graphic skills (point & line)
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례Hwanhee Kim
2017년 4월 26일, NDC2017 발표자료입니다.
콘텐츠 제작은 게임 개발에서 많은 노력과 시간 투자를 필요로하는 작업입니다. 최근 폭발적인 관심을 받고 있는 딥러닝을 통해 여기에 드는 시간을 크게 줄일 수 있습니다. 이 발표에서는 VAE(Variational AutoEncoder)를 이용한 모방을 통한 콘텐츠 생성 기법에 대해서 다룹니다.
Introduction to Steering behaviours for Autonomous AgentsBryan Duggan
Steering behaviours are simple techniques for controlling
goal-directed motion of simulated characters around their world, with
applications in games, animation and robotics.
These behaviours are largely independent of each other and can be combined together to implement actions such as "go from this part of world to another part of the world, avoiding any obstacles that happen to be in the way".
Steering behaviours are used to simulate natural phenomena such as
shoals of fish, flocks of birds and crowd scenes.
Presented at JavaOne 2017 [CON4027], this presentation takes a practical, hands-on look at Java performance tuning. It discusses methodology (spoiler: it’s the scientific method) and how to apply it to Java SE systems (on any budget). Exploring concrete examples with tools such as the Oracle Java Mission Control feature of Oracle Java SE Advanced, VisualVM, YourKit, and JMH, the presentation focuses on ways of measuring performance, how to interpret data, ways of eliminating bottlenecks, and even how to avoid future performance regressions.
A separate version will be uploaded with speaker notes.
Presentation Video : http://tinyurl.com/pfhz96m
Stage 3D introduction in Adobe Flash Player and Adobe AIR lets you use techniques such as deferred lighting, screen space dynamic shadow, MRT, and more through vertex and fragment shaders. Join Jean-Philippe Doiron, Principal Architect R&D at Frima Studio, and Jean-Philippe Auclair, R&D Architect, for a deep dive into GPU programming with the new Flash Player, and discover how to produce beautiful GPU effects that are reusable in your games and applications.
OpenGL - point & line design
introduce the construction of displayers (CRT, Flat-panel, LCD, PDP, projector...)
those render is based on graphic skills (point & line)
[NDC2017] 딥러닝으로 게임 콘텐츠 제작하기 - VAE를 이용한 콘텐츠 생성 기법 연구 사례Hwanhee Kim
2017년 4월 26일, NDC2017 발표자료입니다.
콘텐츠 제작은 게임 개발에서 많은 노력과 시간 투자를 필요로하는 작업입니다. 최근 폭발적인 관심을 받고 있는 딥러닝을 통해 여기에 드는 시간을 크게 줄일 수 있습니다. 이 발표에서는 VAE(Variational AutoEncoder)를 이용한 모방을 통한 콘텐츠 생성 기법에 대해서 다룹니다.
What are the safety risks with autonomous robots? Are autonomous robot a threat to human workers? How can robots achieve intelligent behavior? What mechanisms are needed to reach human agility?
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
3. Project Goals
• Combine deliberative and reactive
algorithms
• Show stability and completeness
• Demonstrate multi-robot coverage on
iCreate robots.
4. Coverage Problem
• Cover Entire Area
• Deliberative Algorithm Plans
Next Point to visit.
• Reactive Algorithm pushes
robot to that point.
• Reactive Algorithm Adds 2
constraints:
• Maintain Communication Distance
• Collision Avoidance
6. Demo for single vehicle
• Implimented on iCreate.
• 5 points to visit.
• Deliberative Algorithm
Selects Point.
• Reactive Algorithm uses
potential field to reach point.
• Point reached when within
some minimum distance.
VIDEO
7. Multi-robot Case
• 2 Robot Coverage
• Blue is free to move VIDEO
• Green must stay in
communication range.
• Matlab Simulation.
9. Positioning System
• Problems with Stargazer.
• Periods of no measurement
• Occasional Bad Measurements
• State Estimation (SPF)
• Combine Stargazer with Odometry
• Reject Bad Measurements
10. SPF Explanation
• Sigma Point Filter uses
Stargazer and Odometry
measures to predict robot
position.
• Non-guassian Noise
• Implimented and Tested on
robot platform.
• Performs very well in the
presence of no measurements
or bad measurement.
12. Roomba Pac-Man
• Implimented 5 Robot Demo along
with Jack Elston.
• Re-creation of Pac-Man Game.
• Demonstrate NetUAS system.
• Showcase most of concepts
from class.
25. Introduction
Robot State Machine
Gradients for “Grasping” the Object
Gradient for Moving the Object
Convergence Simulation Results
Continuing Work
26. Place a single beacon on an object and
another at the object’s destination. Multiple
robots cooperate to move the object.
Goals:
Minimal/No Robot Communication
Object has an Unknown Geometry
Use Gradients for Reactive Navigation
27.
28. Each Robot Knows:
◦ Distance/Direction to Object
◦ Distance/Direction to Destination
◦ Distance/Direction to All Other Robots
◦ Bumper Sensor to Detect Collision
Robots Do Not Know
◦ Object Geometry
◦ Actions other Robots are taking
29.
30. Related “Grasping” Work:
◦ Grasping with hand – Maximize torque [Liu et al]
◦ Cage objects for pushing [Fink et al]
◦ Tug Boats Manipulating Barge [Esposito]
◦ ALL require known geometry
My Hybrid Approach
◦ Even distribution around object
◦ Alternate between Convergence and Repulsion
Gradients
◦ Similar to Cow Herding example from class.
31. Pull towards object:
γ = ri − robj
€ Avoid nearby robots:
sign(d c − ri −r j )+1
( ri − rj − dc2 ) 2
2 2
N 4
1+ d
β = ∏1− 4 c 2
j=1 dc ( ri − rj − dc2 ) 2 + 1
€
33. Repel from all robots:
N
2
β = ∏ ri − rj − dr2
j=1
1
Cost =
(1+ β )1/ κ r
€
€
34.
35. Related Work
◦ Formations [Tanner and Kumar]
◦ Flocking [Lindhé et al]
◦ Pushing objects [Fink et al, Esposito]
◦ No catastrophic failure if out of position.
My Approach:
◦ Head towards destination in steps
◦ Keep close to object.
◦ Communicate “through” object
◦ Maintain orientation.
Assuming forklift on Robot can rotate 360º
36. Next Step Vector:
rObjCenter − rObjDest
rγ i = rideali + dm
rObjCenter − rObjDest
Pull to destination:
€
γ1 = ri − rγ i
€
37. Valley Perpendicular to Travel Vector:
rObjCenterx − rObjDestx
m=−
rObjCentery − rObjDesty + .0001
mrix − riy − mrγ x + rγ y
γ2 = 2
€ (m + 1)
44. Modular Robots
Learning
Contributions
Conclusion
A Young Modular Robot’s Guide to Locomotion
Ben Pearre
Computer Science
University of Colorado at Boulder, USA
December 6, 2009
Ben Pearre A Young Modular Robot’s Guide to Locomotion
45. Modular Robots
Learning
Contributions
Conclusion
Outline
Modular Robots
Learning
The Problem
The Policy Gradient
Domain Knowledge
Contributions
Going forward
Steering
Curriculum Development
Conclusion
Ben Pearre A Young Modular Robot’s Guide to Locomotion
46. Modular Robots
Learning
Contributions
Conclusion
Modular Robots
How to get these to move?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
47. Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion
The Learning Problem
Given unknown sensations and actions, learn a task:
◮ Sensations s ∈ Rn
◮ State x ∈ Rd
◮ Action u ∈ Rp
◮ Reward r ∈ R
◮ Policy π(x, θ) = Pr(u|x, θ) : R|θ| × R|u|
Example policy:
u(x, θ) = θ0 + θi (x − bi )T Di (x − bi ) + N (0, σ)
i
What does that mean for locomotion?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
48. Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion
Policy Gradient Reinforcement Learning: Finite Difference
Vary θ:
◮ Measure performance J0 of π(θ)
◮ Measure performance J1...n of π(θ + ∆1...n θ)
◮ Solve regression, move θ along gradient.
−1
gradient = ∆ΘT ∆Θ ˆ
∆ΘT J
∆θ1 J1 − J0
where ∆Θ = . and J =
ˆ .
. .
. .
∆θn Jn − J0
Ben Pearre A Young Modular Robot’s Guide to Locomotion
49. Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion
Policy Gradient Reinforcement Learning: Likelihood Ratio
Vary u:
◮ Measure performance J(π(θ)) of π(θ) with noise. . .
◮ Compute log-probability of generated trajectory Pr(τ |θ)
H H
Gradient = ∇θ log πθ (uk |xk ) rl
k=0 l=0
Ben Pearre A Young Modular Robot’s Guide to Locomotion
50. Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion
Why is RL slow?
“Curse of Dimensionality”
◮ Exploration
◮ Learning rate
◮ Domain representation
◮ Policy representation
◮ Over- and under-actuation
◮ Domain knowledge
Ben Pearre A Young Modular Robot’s Guide to Locomotion
51. Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion
Domain Knowledge
Infinite space of policies to explore.
◮ RL is model-free. So what?
◮ Representation is bias.
◮ Bias search towards “good” solutions
◮ Learn all of physics. . . and apply it?
◮ Previous experience in this domain?
◮ Policy implemented by <programmer, agent> “autonomous”?
How would knowledge of this domain help?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
52. Modular Robots
The Problem
Learning
The Policy Gradient
Contributions
Domain Knowledge
Conclusion
Dimensionality Reduction
Task learning as domain-knowledge acquisition:
◮ Experience with a domain
◮ Skill at completing some task
◮ Skill at completing some set of tasks?
◮ Taskspace Manifold
Ben Pearre A Young Modular Robot’s Guide to Locomotion
53. Modular Robots
Going forward
Learning
Steering
Contributions
Curriculum Development
Conclusion
Goals
1. Apply PGRL to a new domain.
2. Learn mapping from task manifold to policy manifold.
3. Robot school?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
54. Modular Robots
Going forward
Learning
Steering
Contributions
Curriculum Development
Conclusion
1: Learning to locomote
◮ Sensors: Force feedback on
servos? Or not.
◮ Policy: u ∈ R8 controls
servos
ui = N (θi , σ)
◮ Reward: forward speed
◮ Domain knowledge: none
Demo?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
55. Modular Robots
Going forward
Learning
Steering
Contributions
Curriculum Development
Conclusion
1: Learning to locomote
Learning to move
10
steer bow
5 steer stern
bow
port fwd
0
θ
stbd fwd
port aft
−5 stbd aft
stern
−10
0 500 1000 1500 2000 2500
s
0.4
effort
10−step forward speed
0.3
0.2
v
0.1
0
−0.1
0 500 1000 1500 2000 2500
s
Ben Pearre A Young Modular Robot’s Guide to Locomotion
56. Modular Robots
Going forward
Learning
Steering
Contributions
Curriculum Development
Conclusion
2: Learning to get to a target
◮ Sensors: Bearing to goal.
◮ Policy: u ∈ R8 controls servos
◮ Policy parameters: θ ∈ R16
µi (x, θ) = θi · s (1)
1
= [ θi,0 θi,1 ] (2)
φ
= N (µi , σ)
ui (3)
1
∇θi log π(x, θ) = (ui − θi · s) · s (4)
σ2
Ben Pearre A Young Modular Robot’s Guide to Locomotion
57. Modular Robots
Going forward
Learning
Steering
Contributions
Curriculum Development
Conclusion
2: Task space → policy space
◮ 16-DOF learning FAIL!
Time to complete task
◮ Try simpler task: 300
◮ Learn to locomote with 250
θ ∈ R16
200
seconds
◮ Try bootstrapping:
150
1. Learn to locomote with 8
DOF 100
2. Add new sensing and 50
0 20 40 60 80 100 120
control DOF task
◮ CHEATING! Why?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
58. Modular Robots
Going forward
Learning
Steering
Contributions
Curriculum Development
Conclusion
Curriculum development for manifold discovery?
◮ ´
Etude in Locomotion
◮ Task-space manifold for locomotion
θ ∈ξ·[ 0 0 1 −1 1 −1 1 1 ]T
◮ Stop exploring in task nullspace
◮ FAST!
◮ ´
Etude in Steering
◮ Can task be completed on locomotion manifold?
◮ One possible approximate solution uses the bases
T
0 0 1 −1 1 −1 1 1
1 −1 0 0 0 0 0 0
◮ Can second basis be learned?
Ben Pearre A Young Modular Robot’s Guide to Locomotion
59. Modular Robots
Going forward
Learning
Steering
Contributions
Curriculum Development
Conclusion
3: How to teach a robot?
How to teach an animal?
1. Reward basic skills
2. Develop control along useful DOFs
3. Make skill more complex
4. A good solution NOW!
Ben Pearre A Young Modular Robot’s Guide to Locomotion
60. Modular Robots
Learning
Contributions
Conclusion
Conclusion
Exorcising the Curse of Dimensionality
◮ PGRL works for low-DOF problems.
◮ Task-space dimension < state-space dimension.
◮ Learn f: task-space manifold → policy-space manifold.
Ben Pearre A Young Modular Robot’s Guide to Locomotion