In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...SlideTeam
Showcase how machines are built to perform intelligent tasks by using our content-ready Reinforcement Learning In AI PowerPoint Presentation Slide Templates Complete Deck. Take advantage of these artificial intelligence PowerPoint visuals, and describe how machine learning models are trained to make sequences of decisions in a complex environment. Showcase the types of artificial intelligence such as deep learning, machine learning. Explain the concept of machine learning which delivers predictive models based on the data fed into machine learning algorithms. Take the assistance of our visually attention-grabbing reinforcement learning PowerPoint templates and discuss the effective uses of artificial intelligence in various areas such as supply chain, human resources, fraud detection, knowledge creation, research, and development, etc. You can also present the usage of AI in healthcare. This includes treatment, diagnosis, training and research, early detection, etc. Explain the working of machine learning by downloading our attention-grabbing supervised learning PowerPoint presentation. https://bit.ly/3kQBnEZ
Introductory presentation to Explainable AI, defending its main motivations and importance. We describe briefly the main techniques available in March 2020 and share many references to allow the reader to continue his/her studies.
YouTube: https://youtu.be/LzaWrmKL1Z4
** Python Data Science Training: https://www.edureka.co/python **
In this PPT on “Reinforcement Learning Tutorial” you will get an in-depth understanding about how reinforcement learning is used in the real world. I’ll be covering the following topics in this session:
Introduction to Machine Learning
What is Reinforcement Learning?
Reinforcement Learning with an analogy
Reinforcement Learning process
Reinforcement Learning Counter-Strike example
Reinforcement Learning Definitions
Reinforcement Learning Concepts
Markov’s Decision Process
Understanding Q-Learning
Demo
Check out our Python Training Playlist: https://goo.gl/Na1p9G
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Slidedeck from our seminar about Machine Learning (07/11/2014)
Topics covered:
- What is Machine Learning?
- Techiques (clustering, classification, ...)
- Tools (Mahout, R, Spark MlLib, Weka, ...)
- Practical example of Machine Learning applications
- How to embed Machine Learning in software development
- Demo's
In some applications, the output of the system is a sequence of actions. In such a case, a single action is not important
game playing where a single move by itself is not that important.in the case of the agent acts on its environment, it receives some evaluation of its action (reinforcement),
but is not told of which action is the correct one to achieve its goal
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee
A summary of Chapter 6: Temporal Difference Learning of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee
A summary of Chapter 2: Multi-armed Bandits of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete...SlideTeam
Showcase how machines are built to perform intelligent tasks by using our content-ready Reinforcement Learning In AI PowerPoint Presentation Slide Templates Complete Deck. Take advantage of these artificial intelligence PowerPoint visuals, and describe how machine learning models are trained to make sequences of decisions in a complex environment. Showcase the types of artificial intelligence such as deep learning, machine learning. Explain the concept of machine learning which delivers predictive models based on the data fed into machine learning algorithms. Take the assistance of our visually attention-grabbing reinforcement learning PowerPoint templates and discuss the effective uses of artificial intelligence in various areas such as supply chain, human resources, fraud detection, knowledge creation, research, and development, etc. You can also present the usage of AI in healthcare. This includes treatment, diagnosis, training and research, early detection, etc. Explain the working of machine learning by downloading our attention-grabbing supervised learning PowerPoint presentation. https://bit.ly/3kQBnEZ
Introductory presentation to Explainable AI, defending its main motivations and importance. We describe briefly the main techniques available in March 2020 and share many references to allow the reader to continue his/her studies.
YouTube: https://youtu.be/LzaWrmKL1Z4
** Python Data Science Training: https://www.edureka.co/python **
In this PPT on “Reinforcement Learning Tutorial” you will get an in-depth understanding about how reinforcement learning is used in the real world. I’ll be covering the following topics in this session:
Introduction to Machine Learning
What is Reinforcement Learning?
Reinforcement Learning with an analogy
Reinforcement Learning process
Reinforcement Learning Counter-Strike example
Reinforcement Learning Definitions
Reinforcement Learning Concepts
Markov’s Decision Process
Understanding Q-Learning
Demo
Check out our Python Training Playlist: https://goo.gl/Na1p9G
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Slidedeck from our seminar about Machine Learning (07/11/2014)
Topics covered:
- What is Machine Learning?
- Techiques (clustering, classification, ...)
- Tools (Mahout, R, Spark MlLib, Weka, ...)
- Practical example of Machine Learning applications
- How to embed Machine Learning in software development
- Demo's
In some applications, the output of the system is a sequence of actions. In such a case, a single action is not important
game playing where a single move by itself is not that important.in the case of the agent acts on its environment, it receives some evaluation of its action (reinforcement),
but is not told of which action is the correct one to achieve its goal
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee
A summary of Chapter 6: Temporal Difference Learning of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee
A summary of Chapter 2: Multi-armed Bandits of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Tutorial at 35th Australasian Joint Conference on Artificial Intelligence.
-----
Reinforcement learning (RL) is a branch of artificial intelligence wherein autonomous agents learn to maximise predefined rewards from the environment. Despite immense successes in breaking human records, the current training of RL agents is prohibitively expensive in terms of time, computing resources, and samples. For example, it requires trillions of playing sessions to reach human-level performance on simple video games. The problem of sample inefficiency is exacerbated in stochastic, partially observable, noisy or long-term real-world environments, whereas humans can show excellent performance under these circumstances without much training. That shortcoming of RL agents can be attributed to the lack of efficient human-like memory mechanisms that hasten learning by smartly utilising past observations. This tutorial presents recent advances in memory-based reinforcement learning where emerging memory systems enable sample-efficient, adaptive and human-like RL agents. The first part of the tutorial covers the basics of RL and raises the sample inefficiency issue. The second part presents a taxonomy of memory mechanisms that recent lean RL employs to reduce the number of training samples and resemble human memory. The subsequent three sections study the benefits that memory can provide to RL agents, which can be categorised as (1) Quick access to critical experiences; (2) A better representation of observation contexts; and (3) Intrinsic motivation to explore. Finally, the tutorial concludes with discussions on opening challenges and promising future research on memory-based RL.
Halite is an open source artificial intelligence programming competition, created by Two Sigma, where players build bots using the coding language of their choice to battle on a two-dimensional virtual board. Halite II, running on GCP, supported about 6,000 active game players from about 100 countries and 1,000 institutions over a three month period. The presentation surveys the principles needed for a successful AI programming competition and describes the architecture of the game environment, particularly the support that GCP provided for the support of 12 million game executions written in over 20 programming languages. Among other topics, this talk illustrates the approaches taken to security, scalability, and the considerations needed to allow machine learning bots to place in the top 50 results.
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
My TGC 2018's presentation about Roguelike game design which talks about design aspects of this genre and how we can utilize some of them to our (not necessarily Roguelike) games.
A grand challenge of AI has fallen - a decade earlier than "experts" predicted. But should we care?
What made AlphaGo, the AI built by DeepMind, so unique?
Dive into AlphaGo's system of deep learning, evaluation, and search algorithms that combined to defeat the reigning Go world champion, and draw your own conclusions.
Tim Riser presented an analysis of "Mastering the Game of Go with Deep Neural Networks & Tree Search", a paper by Google DeepMind to the Boston/Cambridge chapter of Papers We Love, a computer science discussion group on June 28, 2016.
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will briefly introduce the game of Go, followed by the techniques and algorithms used by AlphaGo to answer these questions.
Despite immense successes in breaking human records, current training of RL agents is prohibitively expensive in terms of time, GPUs, and samples. For example, it requires hundreds of millions or even billions of steps to reach human-level performance on Atari games. The problem of sample inefficiency is exacerbated in stochastic, partially observable, noisy or long-term real-world environments, whereas humans can show excellent performance under these circumstances without much training. That shortcoming of RL agents can be attributed to the lack of efficient human-like memory mechanisms that hasten learning by smartly utilizing past observations and experiences. This talk presents recent advances in memory-based reinforcement learning where emerging memory systems enable sample-efficient, adaptive and human-like RL agents.
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Spring School of Combinatorics 2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/4njuiaaou1po0y4/AlphaGo.pdf?dl=0
- The corresponding handout is available at http://www.slideshare.net/KarelHa1/mastering-the-game-of-go-with-deep-neural-networks-and-tree-search-handout
- The video is available at https://youtu.be/Lso2kE58JrI
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
Slides for a short lecture (~1 hr) on the foundations of the Alpha Go model developed by Google. Intended for people with little technical background, but with basic familiarity with Deep Learning.
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...Bill Liu
https://learn.xnextcon.com/event/eventdetails/W19061910
Behaviors in games---and in the real world---are often difficult to program explicitly. Reinforcement learning (RL) has shown success in learning behaviors based on a simple defined reward function that incentivises correct behavior.
Unity ML-Agents toolkit enables Unity developers to train reinforcement learning models to control behaviors within their games. Once these models are trained, they can be integrated across platforms into a game build via the Unity Inference Engine.
Furthermore, by enabling communication between a Unity build and Python code, ML-Agents enables RL researchers to use Unity games as training environments.
Similar to A brief overview of Reinforcement Learning applied to games (20)
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A brief overview of Reinforcement Learning applied to games
1. A brief overview of
Reinforcement Learning
applied to games
Thomas Paula
August 16, 2018 - #10 Porto Alegre Machine Learning Meetup
2. Who am I?
2RL applied to games
Thomas Paula
● Machine Learning Engineer and Researcher @HP
● Msc in Computer Science
● POA Machine Learning Meetup
● @tsp_thomas
● tsp.thomas@gmail.com
3. Why study games?
● Simple rules and deep concepts
● Some of them are studied for hundreds or
thousands of years
● Encapsulate real world issues
● Games are fun :)
3RL applied to games
Source: David Silver, 2015
4. Agenda
● Introduction
○ Artificial Intelligence
○ Challenge for AI: beat humans in chess
● Reinforcement Learning
● Deep Reinforcement Learning
● Closing thoughts
4RL applied to games
7. Artificial Intelligence
● “The effort to automate intellectual tasks
normally performed by humans"
● Born in 1950s: people trying to make
computers think
● People used to believe human-level artificial
intelligence = hand-crafted set of rules
● 1950s to 1980s: Symbolic AI
7RL applied to games
8. Why chess is (was) challenging for computers?
Programming a Computer for Playing Chess
● Seminal paper of Claude Shannon in 1950
● Number of possible positions ~10^120
○ Number of atoms in known universe
estimate: 10^78 to 10^82
● Pure brute force: impossible even for modern
computers
8RL applied to games
9. Why chess is (was) challenging for computers?
● Let’s take tic-tac-toe as an example
9RL applied to games
O X O
X X
O
Source: https://materiaalit.github.io/intro-to-ai-17/part2/
Game Tree
11. IBM Deep Blue
● Chess-playing computer developed by IBM
● Won first game against Garry Kasparov on 10 February 1996
● Approach based on Symbolic AI
○ Alpha-beta pruning search algorithm
○ Deep Blue executed it in parallel
● Deep Blue won a six-game match, but was accused of
cheating in the last one
● Results
○ Deep Blue was retired
○ Stockfish
11RL applied to games
Source: https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)
12. Go
12RL applied to games
Number of possible positions: ~10^170!
Branching factor: average is 250!
13. RL applied to games
How can we solve Go?
13
Reinforcement Learning
to the rescue!
15. What is Reinforcement Learning
● Trial and error (no supervisor)
● Feedback is delayed, not instantaneous
● Time matters (data is not i.i.d.)
● Actions affect next states
15RL applied to games
Source: Richard Sutton, 2017
16. Comparison to Supervised/Unsupervised Learning
Supervised Learning
● Set of labeled examples provided by “external supervisor”
● Not applicable to learning from interaction
○ Generally complicated to obtain examples of all situations
Unsupervised Learning
● Usually tries to learn structure/data representation
● Does not exactly match RL: RL wants to maximize a reward
16RL applied to games
Source: David Silver, 2015
17. Reinforcement Learning Agent
17RL applied to games
Policy
A function for the
behavior, which maps
states to actions.
Value function
How good is each state
and/or action
Model
Agent’s representation
of the environment
RL Agent
18. Markov Decision Process (MDP)
18RL applied to games
In general
● Mathematical framework for modelling decision
making
● States, actions, and rewards
Relationship with RL
● Formally describe an environment for RL, where
the environment is fully-observable
● Almost all RL problems can be formalized as
MDPs
Source: David Silver, 2015
19. RL simple example (1)
19RL applied to games
+1
-1
Environment Possible Policy
20. RL simple example (2)- Q-learning
20RL applied to games
Source: https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
21. Examples of RL success in games (prior DL)
21RL applied to games
Backgammon
TD-Gammon (1992)
Scrabble
Maven (2000s)
22. What about Atari games?
22RL applied to games
How to represent complex games in RL scenario?
Can we use Deep Learning to capture information from raw pixels?
24. Deep Q-Learning (DQN)
● Q-Learning is a tabular method
○ What if it’s the first time we’re visiting an state?
● Can we use a neural network as our Q-function?
○ Yes!
○ However, RL is unstable/diverge when using a nonlinear function
approximator (e.g. a neural network)
● DQN has clever techniques to solve that!
24RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
25. DQN - Overview
25RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
26. DQN - Overview
26RL applied to games
Source: Resource Management with Deep Reinforcement Learning, 2016
27. DQN - Breakout
27RL applied to games
Source: https://www.youtube.com/watch?v=TmPfTpjtdgg
28. DQN
28RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
29. DQN
● Single architecture can successfully learn control policies in a range of different
environments
● Deep network architectures and reinforcement learning
○ Experience replay
○ Target network: made algorithm more stable
● Limitations
○ Games that demand more temporally extended strategies still a great
challenge
29RL applied to games
Source: Human-level control through deep reinforcement learning, 2015
30. Go
30RL applied to games
Number of possible positions: ~10^170!
Branching factor: average is 250!
32. AlphaGo - Training Pipeline (simplified)
32RL applied to games
Source: Mastering the game of Go with deep neural networks and tree search, 2016
33. AlphaGo - Monte Carlo Tree Search (MCTS)
33RL applied to games
Source: Mastering the game of Go with deep neural networks and tree search, 2016
34. AlphaGo - Results
● Played against Lee Sedol, in
March 2016
● Lee is has won 18 world titles
● AlphaGo won the match 4-1
34RL applied to games
AlphaGo Documentary (Netflix)
35. AlphaZero (as per David Silver’s NIPS talk)
No human data
● Learns based on self-reinforcement learning,
starting from random
No human features
● Only takes raw board as input
Single neural network
● Policy and Value networks are combined
Simplified search
● No Monte Carlo rollouts, uses neural network to
evaluate
35RL applied to games
Source: 2017 NIPS Keynote by DeepMind's David Silver
36. AlphaZero (as per David Silver’s NIPS talk)
36RL applied to games
Source: 2017 NIPS Keynote by DeepMind's David Silver
38. Dota 2
● Real time strategy (RTS) game
○ Actually a specialization called Multiplayer
online battle arena (MOBA)
● Two teams of five players, where each player
controls a hero
● Main goal is to destroy other opponents “base”
● Lots of challenges for RL
38RL applied to games
39. Dota 2 - Challenges for RL
● Long time horizons
○ 30 fps for 45 minutes
● Partially-observed state
○ Part of the map is seen
○ Needs to make inferences with incomplete data
● High-dimensional, continuous action space
○ Space discretized into 170,000 possible actions
○ ~1,000 valid actions in “a moment”
● High-dimensional, continuous observation space
○ State: 20,000 numbers
39RL applied to games
Source: https://blog.openai.com/openai-five/
40. Dota 2 - OpenAI Five
● Each hero represented as a 1024-unit
LSTM
● Extracts game state with Valve’s Bot API
● Learns entirely from self-play
● Uses Proximal Policy Optimization
(PPO) for training
40RL applied to games
Source: https://blog.openai.com/openai-five/
41. Dota 2 - OpenAI Five
41RL applied to games
Source: https://blog.openai.com/openai-five/
● Simplified version of the game (not all heroes, removed some tactics)
● Played against team of 99.95th percentile Dota players
○ Four have played professionally
● 3 games
○ OpenAI Five won 1st and 2nd
○ 3rd: audience was asked to choose the heroes
■ AI predict 2.9% change of winning
42. OpenAI Five -> Dexterity
● Robot hand that can manipulate physical objects
● Makes use of the same RL algorithm of OpenAI Five
42RL applied to games
45. Take home message
● Reinforcement Learning is a hot topic
● The combination of RL and Deep Learning is producing great results
● Games are a great proxy for developing solutions for real-world problems
○ Lots of challenges far from being solved
● What about an RL agent that plays against you and improves to tackle your
way of playing?
45RL applied to games
46. Thank you!
August 16, 2018 - #10 Porto Alegre Machine Learning Meetup
Thomas Paula
● @tsp_thomas
● tsp.thomas@gmail.com