Ciro Continisio, Technical Evangelist, Unity Technologies
Alessia Nigretti, Technical Evangelist, Unity Technologies
Discover how to implement Machine Learning through Unity, and how to use its power to create the next level of AI.
15. 15
3D Ball
Goal:
Balance the ball on the platform
Reward:
● +0.1 for every frame the ball
remains on the platform
● -1.0 if the ball falls from the
platform
16. 16
Propellers
Goal:
Have the cubes learn to float
Reward:
● +0.1 for each frame the cube floats
● -1.0 for each collision with the floor
17. 17
Arena
Goal:
Push the crate out of the arena
Rewards:
● +0.2 for if closing on the crate
● +0.5 when crates gets further from
the center
● Neg. rewards for delaying, or falling
18. 18
Bounce Ball
Goal:
Bounce ball on top of agent’s head
Reward:
● +0.1 for each frame the ball is
closer to the agent
● -0.1 for each frame the ball is
further away from the agent
21. 21
Roguelike Game
Ingredients
• A simple action game
• All entities are Agents, both the player and the enemies
• Establish a common “interaction language”
• The goal is survival, while attacking other entities
23. 23
Setting up the training
Design and
ideas
• What are the game actions
• What you want the Agent
to learn
• What’s right or wrong
(what to reward)
24. 24
Discrete vs. Continuous
Discrete means that the States/Actions can only have one value
at a time. Like an Enum. It’s either 0, or 1, or 2, or 3, etc.
⬝ Easier: Agents associate actions with rewards more easily
In Roguelike, we use Discrete for Actions. It can have 6 values:
0: Stay still / 1-4: Move in one direction / 5: Attack
Setting up the training
25. 25
Discrete vs. Continuous
Continuous means you can have multiple States (or Actions)
and they all have float values.
⬝ They require more memory for training (hyperparameters)
⬝ Hard to use: they can confuse the Agent
In Roguelike, we use Continuous for States:
health, canAttack, hasTarget, distanceFromTarget, …
Setting up the training
26. 26
Setting up the training
The pseudo-algorithm (AgentStep)
If health > 50% then
If current distance < previous distance then
Reward
End
Else
If current distance > previous distance then
Reward
End
End
If input is attack
If can attack then
Start attack
Else
Punish
Else
If is not healing and health < max health then
Start healing
End
End
Movement Attack
28. 28
Tips on rewards
• Rewards can come in the AgentStep function, but also at
other times (OnCollisionEnter, etc.)
• Agents will find a way to exploit the rewards!
• Small details in rewards influence the learning process
Setting up the training
reward = .2f / (distanceSqr + .01f); > reward = .2f;
29. 29
Training scene
• Position and configure the
agent(s)
• Connect them to the
relevant Brains
• Configure the Academy
Setting up the training
31. 31
Tips for the training environment
• Different situations in parallel help Agents to learn better
• Heuristic Agents are the perfect training dummies!
• Before launching a 1 hour training:
• Double-check your logic so you don’t make wrong
assumptions
• Launch a 1x speed training to see what’s happening
Setting up the training
32. 32
Building and training
• Set the Brain to External
• Build!
• Set up python environment and hyperparameters
• Launch training
Training
34. 34
Training with Tensorflow
• Observe the mean reward
• Stop when it looks stable
• Export the model, import into Unity
• Set the Brain to Internal
• Play!
Training
37. 37
Tips on hyperparameters
• Beta: is the randomisation of actions. If agents corner
themselves on a behaviour quickly, increase beta
• Batch size, Buffer size, Hidden units: they differ a lot
between using Discrete or Continuous spaces
Read the guide: github.com/Unity-Technologies/ml-agents
Training
38. 38
Tips and takeaways
Physics
Because ML runs on the FixedUpdate (for stability):
• Remember Rigidbody.position doesn’t change mid-frame
• Switch Animators from Normal to Animate Physics if
animation is key in the training
• If using interpolation on the RB, Rigidbody.position or
Rigidbody.MovePosition( ) behave differently
39. 39
Tips and takeaways
Build tools
The training process can be
long and repetitive.
Make your life easier by
building some little tools.
41. 41
Next
What now?
• Mix Trained AI with Heuristic AI to obtain final behaviour
Learning from the player
• Gather players’ behaviour and train agents (offline)
based on the information you obtained
• Coming soon: Imitation Learning!