Save the princess!
Simon Belak

@sbelak

simon@metabase.com
We will build an AI to play a silly little game
by training a policy network defined using
Cortex, using a hot new training algorithm we
will implement from the paper first using
Neanderthal and then make massively
parallel using Onyx.
The game
• Find the shortest path to the princess

• Moves: up, down, left, right

• Don’t fall off the edge of the world
The game
• Find the shortest path to the princess

• Moves: up, down, left, right

• Don’t fall off the edge of the world
Computers playing
computer games
Reinforcement learning
• Interact with the environment [embodied cognition]

• Not a single solution but an action to take given environment
[model of the world + model of self, consciousness?]

• Learns via positive/negative feedback
Reinforcement learning:
how it’s usually done
Train a deep neural network using raw sensor
data, usually pixels (ie. no feature engineering)
… but there is another way
population
mutate crossover
next generation
solution
jitter jitter … jitter
update
populate
sample weighted
Classic evolutionary algorithm Evolution strategies
combine weighted
Using ES to train a neural
network
Benefits

• highly parallelizable 

• more robust (less hyperparameters, more
stabile, doesn’t care about the properties of
reward function)

• can exploit structure

• less computationally expensive

Downsides

• takes longer to converge

• noise must lead to different outcomes
Instead of backpropagation use ES on weights
Let’s build it!
1. ES
Neanderthal
• Blazing fast matrix and linear algebra library

• Based on ATLAS and LAPACK

• Runs on CPUs and GPUs

• A study in writing efficient code

• Somewhat terse API (fluokitten helps)
x+y
ax+y
ax+by
x+y
ax+y
ax+by
x+y
ax+y
ax+by
x+y
ax+y
ax+by
1.1 ES parallelized
Onyx
a masterless, cloud scale, fault tolerant,
high performance distributed computation
system
Job =
[[:input :processing-1]
[:input :processing-2]
[:processing-1 :output-1]
[:processing-2 :output-2]]
[{:flow/from :input-stream
:flow/to [:process-adults]
:flow/predicate :my.ns/adult?
:flow/doc "Emits segment if an adult.”}]
workflow
+ flow conditions
+ catalogue[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/doc "Writes segments to a core.async channel"}]
[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/doc "Writes segments to a core.async channel"}]
Job =
[[:input :processing-1]
[:input :processing-2]
[:processing-1 :output-1]
[:processing-2 :output-2]]
[{:flow/from :input-stream
:flow/to [:process-adults]
:flow/predicate :my.ns/adult?
:flow/doc "Emits segment if an adult.”}]
workflow
+ flow conditions
+ catalogue
Describing computation
with data
in
jitter jitter … jitter
update
out
monitor
populate
same channel
in
jitter jitter … jitter
update
out
monitor
populate
accumulates state :(
in
jitter jitter … jitter
update
out
monitor
populate
Resilience and handling
state
• Activity log 

• Window and trigger states checkpointed

• Resume points (transfer state from job to job)

• Configurable flux policies (continue/kill/recover)
Computation graphs are
a great way to structure
data processing code
2. Policy network
Cortex
• Neural networks, regression and feature learning

• Clean idiomatic Clojure API

• Computation encoded as data (and makes good use of it)

• Uses core.matrix for heavy lifting
Encode princess = 1, hero = -1
3. Game
Simulation
• Find the shortest path to the
princess

• Don’t fall off the edge of the world
Reward function
• Play the entire game (planning)

• Collect multiple playthoughts to lessen effects of
randomness
Takeouts
Explore

Have fun

Go on an adventure!
Questions
Simon Belak

@sbelak

simon@metabase.com

Save the princess

  • 1.
    Save the princess! SimonBelak @sbelak simon@metabase.com
  • 2.
    We will buildan AI to play a silly little game by training a policy network defined using Cortex, using a hot new training algorithm we will implement from the paper first using Neanderthal and then make massively parallel using Onyx.
  • 3.
    The game • Findthe shortest path to the princess • Moves: up, down, left, right • Don’t fall off the edge of the world
  • 4.
    The game • Findthe shortest path to the princess • Moves: up, down, left, right • Don’t fall off the edge of the world
  • 5.
  • 6.
    Reinforcement learning • Interactwith the environment [embodied cognition] • Not a single solution but an action to take given environment [model of the world + model of self, consciousness?] • Learns via positive/negative feedback
  • 7.
    Reinforcement learning: how it’susually done Train a deep neural network using raw sensor data, usually pixels (ie. no feature engineering)
  • 8.
    … but thereis another way
  • 10.
    population mutate crossover next generation solution jitterjitter … jitter update populate sample weighted Classic evolutionary algorithm Evolution strategies combine weighted
  • 11.
    Using ES totrain a neural network Benefits • highly parallelizable • more robust (less hyperparameters, more stabile, doesn’t care about the properties of reward function) • can exploit structure • less computationally expensive Downsides • takes longer to converge • noise must lead to different outcomes Instead of backpropagation use ES on weights
  • 12.
  • 13.
  • 15.
    Neanderthal • Blazing fastmatrix and linear algebra library • Based on ATLAS and LAPACK • Runs on CPUs and GPUs • A study in writing efficient code • Somewhat terse API (fluokitten helps)
  • 20.
  • 21.
  • 22.
  • 23.
  • 26.
  • 27.
    Onyx a masterless, cloudscale, fault tolerant, high performance distributed computation system
  • 28.
    Job = [[:input :processing-1] [:input:processing-2] [:processing-1 :output-1] [:processing-2 :output-2]] [{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}] workflow + flow conditions + catalogue[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]} {:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"} {:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
  • 29.
    [{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type:function :my/n 5 :onyx/params [:my/n]} {:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"} {:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}] Job = [[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]] [{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}] workflow + flow conditions + catalogue Describing computation with data
  • 31.
    in jitter jitter …jitter update out monitor populate
  • 32.
    same channel in jitter jitter… jitter update out monitor populate
  • 33.
    accumulates state :( in jitterjitter … jitter update out monitor populate
  • 34.
    Resilience and handling state •Activity log • Window and trigger states checkpointed • Resume points (transfer state from job to job) • Configurable flux policies (continue/kill/recover)
  • 35.
    Computation graphs are agreat way to structure data processing code
  • 36.
  • 37.
    Cortex • Neural networks,regression and feature learning • Clean idiomatic Clojure API • Computation encoded as data (and makes good use of it) • Uses core.matrix for heavy lifting
  • 38.
    Encode princess =1, hero = -1
  • 39.
  • 40.
    Simulation • Find theshortest path to the princess • Don’t fall off the edge of the world
  • 41.
    Reward function • Playthe entire game (planning) • Collect multiple playthoughts to lessen effects of randomness
  • 42.
  • 43.
  • 44.