Neural Network Toolkit: primitiv Explained

primitiv:
Neural Network Toolkit
Yusuke Oda
2018/3/26 ASTREC, NICT

Agenda
• Basics of neural networks with computation graphs
• Design details and examples of primitiv
• An example usage
6/1/2018 Copyright (c) 2018 by Yusuke Oda. 2

Neural Networks with
Computation Graphs
• 𝑦 = tanh 𝑊𝑥 + 𝑏 + 𝑥

Computation Graphs
• → 𝑦 = add tanh add matmul(𝑊, 𝑥), 𝑏 , 𝑥

Computation Graphs
𝑥
𝑏
𝑊

Computation Graphs
𝑥
*
𝑏
𝑊
matmul
𝑊𝑥

Computation Graphs
𝑥
*
𝑏
𝑊
*
matmul
add
𝑊𝑥
𝑊𝑥 + 𝑏

Computation Graphs
𝑥
*
𝑏
𝑊
*
*
matmul
add
tanh
𝑊𝑥
𝑊𝑥 + 𝑏
tanh 𝑊𝑥 + 𝑏

Computation Graphs
𝑥
*
𝑏
𝑊
*
*
𝑦
matmul
add
tanh
add
𝑊𝑥
𝑊𝑥 + 𝑏
tanh 𝑊𝑠 + 𝑏 + 𝑥

Computation Graphs
• FFNNs can be represented as a DAG
= Computation Graph
𝑥
*
𝑏
𝑊
*
*
𝑦
matmul
add
tanh
add
𝑊𝑥
𝑊𝑥 + 𝑏

Forward Calculation
(Retrieving Values)
• Once the computation graph constructed, the
actual computation can be performed along
the graph using the topological order.
3
15
2
5
17
0.9…
3.9…
matmul
add
tanh
add
𝑊𝑥
𝑊𝑥 + 𝑏
1
2
3
4

Backward Calculation
(Retrieving Gradients)
• Backpropagation: using the chain rule of
derivatives along the same graph.
𝑥
*
𝑏
𝑊
*
*
𝑦
matmul
add
tanh
add𝑑𝐸
𝑑𝑥
+= 1 − 𝑦2
𝑑𝐸
𝑑𝑦
𝑑𝐸
𝑑𝑥1
+=
𝑑𝐸
𝑑𝑦
𝑑𝐸
𝑑𝑥2
+=
𝑑𝐸
𝑑𝑦
1
2
3
4
𝑑𝐸
𝑑𝑋1
+=
𝑑𝐸
𝑑𝑌
𝑋2
⊤
𝑑𝐸
𝑑𝑋2
+= 𝑋1
⊤
𝑑𝐸
𝑑𝑌
𝑑𝐸
𝑑𝑥1
+=
𝑑𝐸
𝑑𝑦
𝑑𝐸
𝑑𝑥2
+=
𝑑𝐸
𝑑𝑦

Backward Calculation
(Retrieving Gradients)
• Backpropagation: using the chain rule of
derivatives along the same graph.
𝑥
*
𝑏
𝑊
*
*
𝑦
matmul
add
tanh
add
𝑔 𝑥 += 1 − 𝑦2
𝑔 𝑦
𝑔 𝑥1
+= 𝑔 𝑦
𝑔 𝑥2
+= 𝑔 𝑦
1
2
3
4
𝑔 𝑋1
+= 𝑔 𝑌 𝑋2
⊤
𝑔 𝑋2
+= 𝑋1
⊤
𝑔 𝑌
𝑔 𝑥1
+= 𝑔 𝑦
𝑔 𝑥2
+= 𝑔 𝑦

Function and Variable of Graph
𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
𝑓𝑓𝑤, 𝑓𝑏𝑤

𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
Function: specifies the forward/backward calculation

𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
Variable: represents actual values and gradients
Function: specifies the forward/backward calculation

𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
Arguments: 0 or more
Results: 1 or more

Forward/backward operations of
the function
𝑋1, 𝑔 𝑋1
𝑌, 𝑔 𝑌
𝑋2, 𝑔 𝑋2
𝒇 𝒇𝒘: 𝑋1 … 𝑋 𝑛 ⟼ 𝑌1 … 𝑌 𝑚
𝒇 𝒃𝒘: 𝑋1 … 𝑋 𝑛, 𝑌1 … 𝑌 𝑚, 𝑔 𝑌1
… 𝑔 𝑌 𝑚
⟼ 𝑔 𝑋1
… 𝑔 𝑋 𝑛

Combined Functions
• Any subgraphs with starts/ends by Function can be
as one Function.
𝑋, 𝑔 𝑋
𝑌, 𝑔 𝑌
𝑊
matmul
parameter
𝑏
parameter
add 𝑢 ReLU

Combined Functions
• Any subgraphs with starts/ends by Function can be
as one Function.
𝑋, 𝑔 𝑋
𝑌, 𝑔 𝑌
𝑊
matmul
“Linear” function in some
toolkits owns parameters itself,
and applies 2-3 functions.
parameter
𝑏
parameter
add 𝑢 ReLU
“Linear”

3 Strategies to Construct
Computation Graphs
• Difference: when/how to construct graph and
calculate the results.
• Static construction
• Caffe, Torch, TensorFlow, etc.
• Dynamic construction (define-by-run)
• Chainer, PyTorch, etc.
• Dynamic construction with lazy evaluation
• DyNet, PyTorch(partially), primitiv

Static Construction
• Constructs the computation graph before all
executions.

Static Construction
executions.
𝑥
*
𝑏
𝑊
* *
𝑦matmul
add tanh
add

Static Construction
executions.
𝑥
*
𝑏
𝑊
* *
𝑦matmul
add tanh
add
𝑓
𝑥
𝑏
𝑊 𝑦
Fix

Static Construction
executions.
• Then executes the “fixed” graph with actual data.
𝑥
*
𝑏
𝑊
* *
𝑦matmul
add tanh
add
𝑓
𝑥
𝑏
𝑊 𝑦
Fix
𝑓
…
…
… 𝑦
𝑓
…
…
… 𝑦
𝑓
…
…
… 𝑦
𝑓
3
2
5 3.9
Execute
with data6/1/2018 Copyright (c) 2018 by Yusuke Oda. 25

Dynamic Construction
(define-by-run)
• Graph construction and actual calculation are
performed simultaneously.
3
2
5Run 1

(define-by-run)
3
15
2
5 matmulRun 1

(define-by-run)
3
15
2
5
17
matmul
add
Run 1

(define-by-run)
3
15
2
5
17 0.9…
matmul
add tanh
Run 1

(define-by-run)
3
15
2
5
17 0.9…
3.9…matmul
add tanh
addRun 1

(define-by-run)
3
15
2
5
17 0.9…
3.9…matmul
add tanh
addRun 1
9
18
-1
2
17 0.9…
9.9…matmul
add tanh
addRun 2

Dynamic Construction with Lazy
Evaluation
• Consists of 2 steps:
1. Constructing graphs using only types of values.
2. Performs actual computation (forward/backward)
along the graph.
3
?
2
5
? ?
?matmul
add tanh
add

Evaluation
along the graph.
3
?
2
5
? ?
?matmul
add tanh
add
Query

Evaluation
along the graph.
3
15
2
5
17 0.9…
3.9…matmul
add tanh
add
Query

Pros/cons of each strategies
• Static
• Capable of strong compile-time optimization
• Difficult to construct interactive graphs
• Dynamic (define-by-run)
• Capable of constructing interactive graphs
• Much overheads and difficulty of optimization
• Dynamic + Lazy
• Also capable of interactive graphs
• Applying just-in-time optimization
• 2-pass traversal over the graph
• calculate shapes always, then calculate values on demand.
• Still difficult to entire optimization

primitiv?

primitiv: Dynamic+Lazy NN Toolkit
• Originally forked from DyNet
• Restructured whole components
• Concepts
• Simple
• Compact
• Device/environment independent
• Implicit minibatching
• Multiple language support

primitiv: Simple
• Consists of essential functionalities.
• Pointless things are mostly omitted.
• Less learning cost
• But, the code does not become long.
• Encoder-decoder can be implemented about 300 lines in
C++ (see examples in the repository).

primitiv: Compact
• For minimal installation, you only need GCC/Clang
and CMake.
• If you need to use some specific hardware (e.g.
CUDA), all you need is adding the build switch.
$ git clone https://github.com/primitiv/primitiv
$ cd primitiv
$ cmake .
$ make
$ make install
$ echo "That's all."
$ cmake . –DPRIMITIV_USE_CUDA=ON

primitiv: Device/environment
Independent
• Device-specific code and network structure are
completely separated.
• Once the model was written, the code can be
executed using any (even unknown) hardware with
no modification.

Independent
no modification.
#include <primitiv/primitiv.h>
using namespace primitiv;
namespace F = primitiv::functions;
Node predict(Node &x, Parameter &w, Parameter &b) {
Node ww = F::parameter<Node>(w);
Node bb = F::parameter<Node>(b);
return F::tanh(F::matmul(w, x) + b) + x;
}

Independent
no modification.
Node predict(Node &x, Parameter &w, Parameter &b) {
}
Run on CPU
Run on CUDA
Run on OpenCL
Run on somewhere

primitiv: Implicit Minibatching
• Most networks can be utilized to both
single/minibatched data.

Node predict(
Node &x,
Parameter &w,
Parameter &b)
{
}

Node predict(
Node &x,
Parameter &w,
Parameter &b)
{
}
3
3.9
Single data

Node predict(
Node &x,
Parameter &w,
Parameter &b)
{
}
3
3.9
3
3.9
4 5
4.9 5.9
3-minibatched data
Single data

primitiv: Multiple Language
Support
primitiv Core Library
(C++11)

Support
(C++11)
C++ Apps

Support
(C++11)
Cython
Middleware
primitiv-Python
Python Apps
C++ Apps

Support
(C++11)
Cython
Middleware
primitiv C APIs
(C99)
primitiv-Python
Java
Binding
Rust
Binding
etc.
primitiv-
Java
primitiv-
Rust
etc.
Apps Apps Apps
Python Apps
C++ Apps

Core Components

Core Components of primitiv
• Shape
• Device and Tensor
• Graph and Node
• Parameter and Optimizer
• Other functionalities

Shape
• A Shape represents the volume and the minibatch
size of the data.

Shape
size of the data.
A scalar Shape({})
1 value

Shape
size of the data.
A scalar Shape({})
1 value
A column vector Shape({3})
3 values

Shape
size of the data.
A scalar Shape({})
1 value
3 values
A matrix Shape({3, 4})
12 values

Shape
size of the data.
A scalar Shape({})
1 value
3 values
A matrix Shape({3, 4})
12 values
5 matrices Shape({3, 4}, 5)
60 values
×5

Shape Equivalence Rule
• "1" at the end dimensions are identical with none:
• "1" minibatch is identical with the single data:
Shape({3, 1}) == Shape({3})
Matrix Column vector
Shape({1}) == Shape({})
Column vector Scalar
Shape({2, 3, 4, 1, 1, 1}) == Shape({2, 3, 4})
Shape({2, 3, 4}, 1) == Shape({2, 3, 4})

Minibatch Broadcasting Rule
• Arguments of 𝑛 ≥ 2 -ary functions/ops with
minibatch size 1 are implicitly broadcasted.
x = data with Shape({2, 2}, 123);
y = data with Shape({2, 2}, 123);
z = data with Shape({2, 2});
w = data with Shape({2, 2}, 42);
F::matmul(x, y); Shape({2, 2}, 123)
Operation will be performed for each minibatch separately.
F::matmul(z, w); Shape({2, 2}, 42)
`z` will be implicitly broadcasted.
F::sum({x, y, z}); Shape({2, 2}, 123)
`z` will be implicitly broadcasted.
F::sum({y, z, w}); Error!
Different sizes (123 vs 42) can not be calculated.

Device
• Device objects manages actual subroutines and the
memory management on a specific hardware.
• All hardware-related programs (e.g., CUDA) is
encapsulated in the Device.
CPU CUDA Other Hardwares

Device
Unified "Device" Interface
CPU-specific
Routines
CUDA-specific
Routines
Other Routines

Device
Unified "Device" Interface
Application Application
CPU-specific
Routines
CUDA-specific
Routines
Other Routines

Tensor
• Tensor is the most elementary interface of data.
• Each Tensor is related to a Device, has a Device-
specific memory, and a Shape to represent the
appearance of the data.
• Calculation is performed by eager evaluation.
Results are obtained immediately
Tensor
Reference to
the Device
Device-specific
Memory
Shape

Snippet:
Using Device and Tensor
// primitiv::functions has many functions for Tensor.
devices::Naive dev1; // Initializes CPU device
devices::CUDA dev2(0); // Initializes CUDA on GPU 0
devices::CUDA dev3(1); // Initializes CUDA on GPU 1
// `dev1` -- `dev3` have the same "Device" interface.

Snippet:
Using Device and Tensor
// Making a new Tensor on `dev1`
Shape s({2, 2});
std::vector<float> data {1, 2, 3, 4}; // column-major
Tensor x1 = F::input<Tensor>(s, data, dev1);
// Making an 2-dimensional identity matrix on `dev2`
Tensor x2 = F::identity<Tensor>(2, dev2);
// Move x1 onto `dev2`
Tensor x11 = F::copy(x1, dev2);
// Math
Tensor x3 = x11 + x2; // x3 == {2, 2, 3, 5}
Tensor xe = x1 + x2; // Error: different device
Tensor x4 = F::exp(x1);
std::vector<float> ret = x4.to_vector(); // {2.7,7.4,20.,55.}

Default Device
• The "Device" argument of each function can be
omitted using the default device.
devices::CUDA dev(0);
// Specifies `dev` as the default
Device::set_default(dev);
// Same as F::input<Tensor>(shape, data, dev);
Tensor x = F::input<Tensor>(shape, data);

Graph and Node
• Graph object represents a computation graph and
its states.
*
*
*
*
* *
*matmul
add tanh
add
parameter
parameter
input
Graph

Graph and Node
• Graph object represents a computation graph and
its states.
• Node object represents a variable node in the
Graph.
*
*
*
*
* *
*matmul
add tanh
add
parameter
parameter
input
GraphNode
Reference
to Graph
Variable
ID

Adding new Nodes into Graph
• Simply applying functions to add a new calculation
into the Graph.
• Node has the similar interface to Tensor.
• Math functions
• Arithmetic operations
x
Node x = F::input<Node>(shape, data);
input

Adding new Nodes into Graph
• Simply applying functions to add a new calculation
into the Graph.
• Node has the similar interface to Tensor.
• Math functions
• Arithmetic operations
x yexpx
Node x = F::input<Node>(shape, data);
Node y = F::exp(x);
input input

Lazy Evaluation through Nodes
• Unlike Tensor, Node is just a placeholder of values,
and does not invoke actual computation when it is
created.
• When the value was explicitly queried, all required
calculations are invoked.
? ?exp
std::vector<float> ret = y.to_vector()
input
y
Query

created.
Val ?exp
input
Invoke!
y

created.
Val Valexp
input
Invoke!
y
Return

• Once the results are calculated, Nodes caches the
values and it will be reused by future queries.
• Unused values are never calculated.
Cac
hed
Cac
hed
?
Cac
hed
? ?
?matmul
add tanh
add
Query6/1/2018 Copyright (c) 2018 by Yusuke Oda. 74

• Once the results are calculated, Nodes caches the
values and it will be reused by future queries.
• Unused values are never calculated.
Cac
hed
Cac
hed
Val
Cac
hed
Val Val
?matmul
add tanh
add
Invoked Invoked Invoked
Not
Invoked

Parameter
• Parameter objects represents a trainable parameter
in the network.
• Its values can be used in a variable of Graph, and its
gradients are updated by Graph.
• Initial values can be specified by hand, or using
Initializer object.
Parameter
Reference to
the Device
Values
Cumulative
Gradients
Other
Statistics

Optimizer
• Optimizer manages an update policy (SGD, Adam,
etc.) of Parameters.
• It consumes the gradient information that
Parameter holds to update the values.
• It also registers the statistics of each Parameter that
the update policy requires.
• I.e., all statistics about the Parameter is populated in the
Parameter object itself, and Optimizer does not have
such information.

Snippet:
Initializing Parameter/Optimizer
// Device
devices::CUDA dev(0);
// Parameter/Optimizer
Parameter p1(Shape({3}), {1, 2, 3});
Parameter p2(Shape({3}), initializers::Uniform(-1, 1));
// Using a uniform distribution to the initial values.
// Optimizer
optimizers::SGD opt(0.1); // Initializes SGD with LR=0.1.
opt.add(p1, p2); // Registers `p1` and `p2` to the optimizer.

Backpropagation
• Backpropagation can be performed through Nodes
by invoking bakcward() function.
• Tensors can not perform backpropagation because they
don't manage gradients and computation graphs.
• If the computation graph had some Parameters,
their gradients are updated by backward().

Snippet: Backpropagation
Graph g;
Graph::set_default(g); // Make g as the default.
Parameter p(Shape({3}), {1, 2, 3});
optimizers::SGD opt(0.1);
opt.add(p);
Node w = F::parameter(p);
Node x = F::input(Shape({3}), {2, 3, 5});
Node y = w * x; // Elementwise multiplication
y.to_vector(); // {2, 6, 15}
opt.reset_gradients(); // Make all gradients of parameters 0.
y.backward(); // Performs the backpropagation.
p.gradient().to_vector(); // {2, 3, 5}
opt.update(); // Performs the SGD rule:
// {1, 2, 3} – 0.1 * {2, 3, 5}
p.value().to_vector(); // {0.8, 1.7, 2.5}

Example

The Data
• We use synthetic data in this example...
#include <random>
#include <tuple>
class DataSource {
std::mt19937 rng;
std::normal_distribution<float> data_dist, noise_dist;
public:
DataSource(float data_sd, float noise_sd)
: rng(std::random_device()())
, data_dist(0, data_sd)
, noise_dist(0, noise_sd) {}
std::tuple<float, float, float> operator()() {
const float x1 = data_dist(rng);
return std::make_tuple(
x1 + noise_dist(rng),
x1 * x2 >= 0 ? 1 : -1);
}
};

The Data
• We use synthetic data in this example...
#include <random>
#include <tuple>
class DataSource {
std::mt19937 rng;
std::normal_distribution<float> data_dist, noise_dist;
public:
DataSource(float data_sd, float noise_sd)
: rng(std::random_device()())
, data_dist(0, data_sd)
, noise_dist(0, noise_sd) {}
std::tuple<float, float, float> operator()() {
x1 * x2 >= 0 ? 1 : -1);
}
};

The Data ... XOR
Where:
• data_sd == 1
• noise_sd == 0.1
Input: 𝒙 ∶= 𝑥1, 𝑥2 ∈ ℝ2
Output: 𝑦 ∈ ℝ 0,1

The Network
• We use a simple MLP:
𝑦 = tanh 𝑊𝒉𝑦 𝒉 + 𝑏 𝑦
𝒉 = tanh 𝑊𝒙𝒉 𝒙 + 𝒃 𝒉
• Where:
𝒉 ∈ ℝ 𝑁
W𝒙𝒉 ∈ ℝ 𝑁×2
𝒃 𝒉 ∈ ℝ 𝑁
𝑊𝒉𝑦 ∈ ℝ1×𝑁
𝑏 𝑦 ∈ ℝ

Code 1: Initialization
• Including headers and declaring the main function
#include <iostream>
#include <vector>
using namespace std;
int main() {
devices::Naive dev; // uses CPU
Graph g;
Graph::set_default(g);
// All code will be described here.
return 0;
}

Code 2: Parameter and Optimizer
• We have 4 parameters: 𝑊𝒉𝑦, 𝑏 𝑦, 𝑊𝒙𝒉, 𝒃 𝒉.
(in main function)
constexpr unsigned N = 8; // #hidden units
Parameter pw_xh({N, 2}, initializers::XavierUniform());
Parameter pb_h({N}, initializers::Constant(0));
Parameter pw_hy({1, N}, initializers::XavierUniform());
Parameter pb_y({}, initializers::Constant(0));
constexpr float learning_rate = 0.1;
optimizers::SGD opt(learning_rate);
opt.add(pw_xh, pb_h, pw_hy, pb_y);

Code 3: Writing The Network
• Using lambda:
(in main function)
auto feedforward = [&](const Node &x) {
const Node w_xh = F::parameter<Node>(pw_xh); // Shape({N, 2})
const Node b_h = F::parameter<Node>(pb_h); // Shape({N})
const Node w_hy = F::parameter<Node>(pw_hy); // Shape({1, N})
const Node b_y = F::parameter<Node>(pb_y); // Shape({})
const Node h = F::tanh(F::matmul(w_xh, x) + b_h); // Shape({N}, B)
return F::tanh(F::matmul(w_hy, h) + b_y); // Shape({}, B)
};

Code 4: Loss Function
• Similar to the main network:
(in main function)
auto squared_loss = [](const Node &y, const Node &t) {
const Node diff = y - t; // Shape({}, B)
return F::batch::mean(diff * diff); // Shape({})
};

Code 5: Making The Minibatch
• This section is out of the toolkit, just up to the data.
(in main function)
constexpr float data_sd = 1.0;
constexpr float noise_sd = 0.1;
DataSource data_source(data_sd, noise_sd);
auto next_data = [&](unsigned minibatch_size) {
std::vector<float> data;
std::vector<float> labels;
for (unsigned i = 0; i < minibatch_size; ++i) {
float x1, x2, t;
std::tie(x1, x2, t) = data_source();
data.emplace_back(x1);
data.emplace_back(x2);
labels.emplace_back(t);
}
F::input<Node>(Shape({2}, minibatch_size), data), // input data `x`
F::input<Node>(Shape({}, minibatch_size), labels)); // label data `t`
};

Code 6: Training Loop
(in main function)
for (unsigned epoch = 0; epoch < 100; ++epoch) {
g.clear();
// Initializes the computation graph
Node x, t;
std::tie(x, t) = next_data(1000); // Obtains the next data
const Node y = feedforward(x); // Calculates the network
const Node loss = squared_loss(y, t); // Calculates the loss
std::cout << epoch << ": train loss=" << loss.to_float() << std::endl;
// Performs backpropagation and updates parameters
opt.reset_gradients();
loss.backward();
opt.update();
}

Code 6: Training Loop
(in main function)
g.clear();
// Initializes the computation graph
Node x, t;
std::tie(x, t) = next_data(1000); // Obtains the next data
const Node y = feedforward(x); // Calculates the network
const Node loss = squared_loss(y, t); // Calculates the loss
std::cout << epoch << ": train loss=" << loss.to_float() << std::endl;
// Performs backpropagation and updates parameters
opt.reset_gradients();
loss.backward();
opt.update();
}
$ g++ -std=c++11 code.cc -lprimitiv
$ ./a.out
0: loss=1.17221
1: loss=1.07423
2: loss=1.06282
3: loss=1.04641
4: loss=1.00851
5: loss=1.01904
...

Code 7: Testing
(in main function)
(Training process written in the previous code block)
if (epoch % 10 == 9) {
const vector<float> test_x_data {1, 1, -1, 1, -1, -1, 1, -1};
const vector<float> test_x_data {1, -1, 1, -1};
const Node test_x = F::input<Node>(Shape({2}, 4), test_x_data);
const Node test_t = F::input<Node>(Shape({}, 4), test_t_data);
const Node test_y = feedforward(test_x);
const Node test_loss = squared_loss(test_y, test_t);
std::cout << "test results:";
for (float val : test_y.to_vector()) {
std::cout << ' ' << val;
}
std::cout << "ntest loss: " << test_loss.to_float() << std::endl;
}
}

Code 7: Testing
(in main function)
if (epoch % 10 == 9) {
}
}
}
1, 1 ⟼ 1
−1, 1 ⟼ −1
−1, −1 ⟼ 1
1, −1 ⟼ −1

Code 7: Testing
(in main function)
if (epoch % 10 == 9) {
}
}
}
$ g++ -std=c++11 code.cc -lprimitiv
$ ./a.out
...
8: loss=0.933427
9: loss=0.927205
test results: 0.04619 -0.119208 0.0893511 -0.149148
test loss: 0.809695
10: loss=0.916669
11: loss=0.91744
...
18: loss=0.849496
19: loss=0.845048
test results: 0.156536 -0.229959 0.171106 -0.221599
test loss: 0.649342
20: loss=0.839679
21: loss=0.831217
...
1, 1 ⟼ 1
−1, 1 ⟼ −1
−1, −1 ⟼ 1
1, −1 ⟼ −1

Links
• Public repository (components, tests, examples)
• https://github.com/primitiv
• Slack (conversation)
• https://primitiv-forum.slack.com
• Documentation (tutorial, design, reference)
• http://primitiv.readthedocs.io/en/develop

Thanks!

Neural Network Toolkit: primitiv Explained

Recommended

Recommended

More Related Content

Similar to Neural Network Toolkit: primitiv Explained

Similar to Neural Network Toolkit: primitiv Explained (20)

More from Yusuke Oda

More from Yusuke Oda (13)

Recently uploaded

Recently uploaded (20)

Neural Network Toolkit: primitiv Explained