Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Malmotutorial

1,375 views

Published on

malmo platform tutorial for reinforcement learning on Minecraft

Published in: Technology
  • Did you try ⇒ www.WritePaper.info ⇐?. They know how to do an amazing essay, research papers or dissertations.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • A professional Paper writing services can alleviate your stress in writing a successful paper and take the pressure off you to hand it in on time. Check out, please ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! I just wanted to share a list of sites that helped me a lot during my studies: .................................................................................................................................... www.EssayWrite.best - Write an essay .................................................................................................................................... www.LitReview.xyz - Summary of books .................................................................................................................................... www.Coursework.best - Online coursework .................................................................................................................................... www.Dissertations.me - proquest dissertations .................................................................................................................................... www.ReMovie.club - Movies reviews .................................................................................................................................... www.WebSlides.vip - Best powerpoint presentations .................................................................................................................................... www.WritePaper.info - Write a research paper .................................................................................................................................... www.EddyHelp.com - Homework help online .................................................................................................................................... www.MyResumeHelp.net - Professional resume writing service .................................................................................................................................. www.HelpWriting.net - Help with writing any papers ......................................................................................................................................... Save so as not to lose
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to enjoy the Good Life: making money in the comfort of your own home with just your laptop, then this is for YOU...  http://t.cn/AieX2Loq
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Malmotutorial

  1. 1. This tutorial
  2. 2. General Game AI Research
  3. 3. Game AI Competitions
  4. 4. Artificial General Intelligence in Games
  5. 5. Human-level Control Through Deep Reinforcement Learning V. Mnih et al. https://storage.googleapis.com/deep mind- media/dqn/DQNNaturePaper.pdf
  6. 6. General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms Diego Perez-Liebana, Jialin Liu, Ahmed Khalifa, Raluca D. Gaina, Julian Togelius, Simon M. Lucas https://arxiv.org/pdf/1802.10363 http://www.gvgai.net
  7. 7. Project Malmo aka.ms/Malmo github.com/Microsoft/malmo The Malmo Platform for Artificial Intelligence Experimentation Matthew Johnson, Katja Hofmann, Tim Hutton, & David Bignell 2016
  8. 8. Malmo design principles Beyond “narrow AI” with multi-task learning Wired for multi-agent tasks (including human agents)
  9. 9. Use Cases and Design Principles into the game through an intuitive yet powerful API – building on existing Minecraft capabilities Built for extensions and novel uses – open source; “plug-and-play” design of observation, command, reward handlers Low entry barrier: provide cross-language (currently: Java, .NET, C/C++, Python, Lua) & cross-platform (Windows, Linux, MacOS) API
  10. 10. Malmo = Minecraft Mod + API + tools
  11. 11. import import MalmoPython MalmoPython “my_mission.xml" MalmoPython “save.tgz" // start a new mission while // interpret world state for in print "Summed reward:" for in print "Observation:" // act "move 1" "turn 0.5" “jump 1" Python example
  12. 12. import import MalmoPython MalmoPython “my_mission.xml" MalmoPython “save.tgz" // start a new mission while // interpret world state for in print "Summed reward:" for in print "Observation:" // act "move 1" "turn 0.5" “jump 1" Python example
  13. 13. import import MalmoPython MalmoPython “my_mission.xml" MalmoPython “save.tgz" // start a new mission while // interpret world state for in print "Summed reward:" for in print "Observation:" // act "move 1" "turn 0.5" “jump 1" Python example
  14. 14. import import MalmoPython MalmoPython “my_mission.xml" MalmoPython “save.tgz" // start a new mission while // interpret world state for in print "Summed reward:" for in print "Observation:" // act "move 1" "turn 0.5" “jump 1" Python example
  15. 15. import import MalmoPython MalmoPython “my_mission.xml" MalmoPython “save.tgz" // start a new mission while // interpret world state for in print "Summed reward:" for in print "Observation:" // act "move 1" "turn 0.5" “jump 1" Python example
  16. 16. import import MalmoPython MalmoPython “my_mission.xml" MalmoPython “save.tgz" // start a new mission while // interpret world state for in print "Summed reward:" for in print "Observation:" // act "move 1" "turn 0.5" “jump 1" Python example
  17. 17. Example: Tabular Q-Learning in Malmo
  18. 18. Example: Deep Q-Learning in Malmo
  19. 19. Malmo design principles Low entry barrier, yet powerful Wired for multi-agent tasks (including human agents)
  20. 20. <ServerHandlers> <FlatWorldGenerator generatorString="3;7,220*..."/> <DrawingDecorator> [...] <DrawCuboid x1="-2" y1="45" z1="-2" x2="7" y2="45" z2="18" type="lava" /> <!-- lava floor --> <DrawCuboid x1="1" y1="45" z1="1" [...] type="sandstone" /> <!-- floor of the arena --> <DrawBlock x="4" y="45" z="1" type="cobblestone" /> <!-- the starting marker --> [...] </ServerHandlers> <AgentHandlers> <ObservationFromFullStats/> <DiscreteMovementCommands> <ModifierList type="deny-list"> <command>attack</command> </ModifierList> </DiscreteMovementCommands> <RewardForTouchingBlockType> <Block reward="-100.0" type="lava“ behaviour="onceOnly"/> <Block reward="100.0" type="lapis_block“ behaviour="onceOnly"/> </RewardForTouchingBlockType> <RewardForSendingCommand reward="-1"/> </AgentHandlers> Example Task (Mission XML)
  21. 21. Creating new tasks is easy http://sameersingh.org/courses/ai proj/sp17/projects.html
  22. 22. Malmo design principles Low entry barrier, yet powerful Beyond “narrow AI” with multi-task learning
  23. 23. A natural environment for multi-agent learning
  24. 24. Goal: foster research in collaborative AI Details: https://www.microsoft.com/en-us/research/academic-program/collaborative-ai-challenge
  25. 25. MARLÖ Competition – The Multi-Agent Reinforcement Learning in MalmÖ
  26. 26. Organizers
  27. 27. MARLO: Motivation • General-reward settings are the most realistic for many real-world applications but are also notoriously challenging • More research on insights and approaches that generalize beyond individual tasks and opponent types. • The cost of creating tasks and opponents amortizes as both can be shared by a large community
  28. 28. Overview • Participants develop agents which play tasks on Malmo platform • The agents play in multiple games of different scenarios • Each game has a different set of multi-agent tasks for training, validation and final test • Participants use those tasks to train and validate their agents • The agents play the final test task to determine the winner of MARLO in a tournament
  29. 29. Competition structure
  30. 30. MARLO Tournament
  31. 31. Evaluation • Each league (P players in a group) is played across the same N games, with T repetitions on the private task of each game. • Each game has its own leaderboard, ranking entries and awarding points: 25 points for the 1st, 18 for the 2nd, 15, 12, 10, 8, 6, 4, 2, 1 and 0 for the 11th onwards. • The final ranking for each league is determined by summing points across all games.
  32. 32. Schedule (draft) • Same version as multi-agent tasks but using bots, which run locally • Top 32 evaluated teams are invited to the final round • Multi-agent games in remote server for final tournament • Live competition!!
  33. 33. Participation: Eligibility • A team consists of up to five participants • 18 years of age or older. If any team member is 18 years of age or older, but is considered a minor in their place of residence, they should ask their parent’s or legal guardian’s permission prior to submitting an entry into the Competition • Award: available only for participants affiliated with a University or a non- profit research organization
  34. 34. What you get from the competition • Award • 1st place: 10,000 USD-equivalent Azure plus a travel grant to join a relevant academic conference or workshop. • 2nd place: 5,000 USD-equivalent Azure. • 3rd place: 3,000 USD-equivalent Azure. • Publication • The top three entries will be invited as co-authors in a paper summarizing the competition structure, rules, approaches, results and main take-aways.
  35. 35. Challenge Games
  36. 36. Mob Chase
  37. 37. Mob Chase 1 point 0.2 points -0.02 points
  38. 38. Mob Chase
  39. 39. Mob Chase
  40. 40. Mob Chase
  41. 41. Mob Chase __________ _wwwwwwwww _w*.....=w ww......ww w=...*..w_ ww......w_ _w.*..*.ww _w......=w _ww=wwwwww __www_____ _www______ _w=wwwwww_ _w..*.*.w_ _w*.....w_ _w......w_ _w.*.*..w_ _w......w_ _w......w_ _w==wwwww_ _wwww_____ ________www_ wwwwwwwww=w_ w=......*.w_ ww........ww _w....*.*.=w ww........ww w=.....*..w_ ww*.*.....w_ _w........w_ _w........w_ _wwwwwwwwww_ ____________ __________ __________ __________ __________ __________ __________ __________ __________ __________ __________ __________ _wwwwwwww_ _w______w_ _w______w_ _w______w_ _w______w_ _w______w_ _w______w_ _wwwwwwww_ __________ __________ _wwwwwwww_ _w......w_ _w......w_ _w......w_ _w......w_ _w......w_ _w......w_ _wwwwwwww_ __________ __________ _wwwwwwwww _w......=w ww......ww w=......w_ ww......w_ _w......ww _w......=w _ww=wwwwww __www_____ __________ _wwwwwwww_ _w......=_ _w......w_ _=......w_ _w......w_ _w......w_ _w......=_ _ww=wwwww_ __________ __________ _wwwwwwwww _w*.....=w ww......ww w=...*..w_ ww......w_ _w.*..*.ww _w......=w _ww=wwwwww __www_____
  42. 42. Build Battle
  43. 43. Build Battle 1 point • +.2 points • -.2 points -0.02 points
  44. 44. Build Battle
  45. 45. Build Battle
  46. 46. Treasure Hunt
  47. 47. Treasure Hunt 0.5 points • 0.25 points • -1 points -0.02 points
  48. 48. Treasure Hunt
  49. 49. Treasure Hunt
  50. 50. Treasure Hunt
  51. 51. Treasure Hunt wwwwwwwwwwwwwwwwwwww w...e+.............w w..................w w.gggggggggggggggggw w...............A..w w.................Aw w..................w w.......e.....+..+.w w.......+..........w w..................w w..................w w......=...........w wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwww wg...............=w wg................w wg................w wg...A............w w.................w wg........*......Aw wg................w wg................w wg............+...w wg......e.........w wg..............*.w wg................w wg.e..............w wg............*...w wg................w wggggggggg.gggggggw wg................w wg................w wwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwwww w........g..........w we.+.....g..........w w........g......+...w w*....e..g..........w w........g..........w w........g..........w w........g..........w wA.......g..........w w...A...............w w........g.........=w w........g..........w w........g...+......w w........g..........w w........g..........w w..*.....g..........w wwwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww w..................w w..................w w..................w w..................w w..................w w..................w w..................w w..................w w..................w w..................w w..................w wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww w....+.............w w..................w w.gggggggggggggggggw w..................w w..................w w..................w w.............+..+.w w.......+..........w w..................w w..................w w..................w wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww w..................w w..................w wggggggggggggggggggw w..................w w..................w w..................w w..................w w..................w w..................w w..................w w..................w wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww w..................w w..................w w.gggggggggggggggggw w..................w w..................w w..................w w..................w w..................w w..................w w..................w w..................w wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww w...e+.............w w..................w w.gggggggggggggggggw w..................w w..................w w..................w w.......e.....+..+.w w.......+..........w w..................w w..................w w..................w wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww w...e+.............w w..................w w.gggggggggggggggggw w..................w w..................w w..................w w.......e.....+..+.w w.......+..........w w..................w w..................w w......=...........w wwwwwwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww w...e+.............w w..................w w.gggggggggggggggggw w...............A..w w.................Aw w..................w w.......e.....+..+.w w.......+..........w w..................w w..................w w......=...........w wwwwwwwwwwwwwwwwwwww
  52. 52. Qualifying Task • MarLo-FindTheGoal-v0 • 7x7 room • Goal: find the goal ☺ (yellow block) • Rewards: • -0.01 per command • 100 commands max • 1.0 find goal • -0.1 out of time
  53. 53. Participating in the Competition
  54. 54. Project MARLÖ • Multi-Agent Reinforcement Learning in Malmo • Reinforcement learning wrapper build on top of project Malmo • Proposes to inspire the creation of extremely potent general agents through a multi-agent, multi-game environment • Uses OpenAI GYM format • Also on GitHub! • https://github.com/crowdAI/marLo
  55. 55. • Install Malmo • Anaconda (recommended) • Pip (+ git) • Repack • Manual compilation • Install Marlo Installation instructions
  56. 56. • Install Malmo • Anaconda (recommended) • Pip (+ git) • Repack • Manual compilation • Install Marlo Installation instructions
  57. 57. • Install Malmo • Anaconda (recommended) • Pip (+ git) • Repack • Manual compilation • Install Marlo Installation instructions
  58. 58. • Install Malmo • Anaconda (recommended) • Pip (+ git) • Repack • Manual compilation • Install Marlo Installation instructions
  59. 59. • Install Malmo • Anaconda (recommended) • Pip (+ git) • Repack • Manual compilation • Install Marlo Installation instructions
  60. 60. First MARLO agent
  61. 61. First MARLO agent
  62. 62. First MARLO agent
  63. 63. First MARLO agent
  64. 64. How about multiple agents?
  65. 65. How about multiple agents?
  66. 66. How about multiple agents?
  67. 67. How about multiple agents?
  68. 68. How about multiple agents?
  69. 69. Agents: a semi-technical view • Agents in Marlo are simple and work in a very Gym-like format: • Start up a Minecraft client on port 10000 • Use “marlo.make()” function to make an environment. This returns a user token • Use the user token to generate an image of the environment for agent use with “marlo.init()” • Run an agent to play the game • We have seen a sample random agent that plays any game it connects to • We also provide examples of more complex agents: • ChainerRL agents (DQN, PPO) • TensorBoard-Chainer plotting compatible • Other environments (TensorFlow, KerasRL, PyBrain) are possible – the only requirement is that they comply with the Gym API
  70. 70. DQN Example
  71. 71. DQN Example
  72. 72. DQN Example
  73. 73. DQN Example
  74. 74. DQN Example
  75. 75. DQN Example
  76. 76. Experiments • A simple script which trains an agent over a set number of steps and episodes is provided within the Marlo package • The underlying functionality is simple: at the beginning of training, reset the environment:
  77. 77. Experiments • Main loop with stopping condition: • Episode ends or maximum number of steps reached
  78. 78. Experiments • Log results of the episode • We incorporate an example to plot using Tensorboard-Chainer
  79. 79. Plotting results (Tensorboard-Chainer) • Works much like your typical Tensorboard, only it’s abstracted to work with Chainer • Can be used to gather images, text, audio and histograms
  80. 80. Submission 1. Create a private repository on gitlab.crowdai.org. It must contain: • Dockerfile that installs dependencies and sets up everything • crowdai.json file with this mandatory fields: • challenge_id - ”marLo" • grader_id - " marLo" • author - name of the author (string), for teams, pleas also create a field 'authors' containing a list with all authors
  81. 81. Submission 2. Submitting to crowdAI: • Create and push a new tag • Each tag counts as a new submission: • You will be able to see your AI agent actually play the game and see more details about your submission evaluation of your submission on: https://gitlab.crowdai.org/<your-crowdAI-user-name>/marLo/issues • A video of the game will also be generated and available from the leaderboard
  82. 82. • Follow Malmo: @Project_Malmo and website (aka.ms/malmo) People on Twitter: @diego_pliebana, @katjahofmann, @MeMohanty • MARLO Github: https://github.com/crowdAI/marLo • MARLO Documentation: https://marlo.readthedocs.io/en/latest/ • Competition website https://www.crowdai.org/challenges/marlo-2018 • AIIDE 2018 Workshop https://marlo-ai.github.io/ Follow the project
  83. 83. Hands-On Time
  84. 84. Hands-On Time 1. Install Malmo and Marlo 2. Play the games 3. Execute agents Doc: https://marlo.readthedocs.io/en/latest/ Code: https://github.com/crowdAI/marLo/ Competition: https://www.crowdai.org/challenges/marlo-2018 We’re here to help!

×