SlideShare a Scribd company logo
Tensor Models and Other Dreams...
Andres Mendez-Vazquez
January 26, 2018
1 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
2 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
3 / 64
Tensors are this way...
As words defining an important moment in life
Without you
All the stars we steal from the night sky
Will never be enough
Never be enough
These hands could hold the world
but it’ll
Never be enough...
- Justin Paul / Benj Pasek, Greatest Showman
4 / 64
Tensors are like such words...
They represent generalizations that represent our dreams...
In Data Sciences...
5 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
6 / 64
Document Representation
Imagine the following...
You have a bunch of documents... They are hundred thousands of them...
7 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
The Matrix at the Center of Everything!!!
The Vector/Matrix Representation
They are basically a N × d matrix like this
A =









(x1)1 · · · (x1)j · · · (x1)d
...
...
(xi)1 (xi)j (xi)d
...
...
(xN )1 · · · (xN )j · · · (xN )d









A is a matrix with...
N represents the thousands of documents...
d represents the thousands of words in a dictionary.....
9 / 64
The Matrix at the Center of Everything!!!
The Vector/Matrix Representation
They are basically a N × d matrix like this
A =









(x1)1 · · · (x1)j · · · (x1)d
...
...
(xi)1 (xi)j (xi)d
...
...
(xN )1 · · · (xN )j · · · (xN )d









A is a matrix with...
N represents the thousands of documents...
d represents the thousands of words in a dictionary.....
9 / 64
A Small Problem
The matrix alone consumes... so much...
You have 2 bytes per memory cell
If we have N = 106
, d = 50, 000
We have
2 × N × d = 100 Gigabytes
10 / 64
A Small Problem
The matrix alone consumes... so much...
You have 2 bytes per memory cell
If we have N = 106
, d = 50, 000
We have
2 × N × d = 100 Gigabytes
10 / 64
Danger!!! Will Robinson
Lost in Space
11 / 64
We have a trick!!!
Something Notable
The Matrix is Highly SPARSE
12 / 64
Therefore
If you are smart enough
You start represent the matrix information using sparse techniques
5x5 Matrix
Numeric Elements
Empty Elements
Sparse Matrix
13 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
How much compression can we get?
The Matrix Sparse Representation
It Achieves 90% Compression - We go from 100 Gigabytes to 10
Gigabytes
From 50,000 dimensions/words we go to 300 dimensions
Using the Singular Value Decomposition
Making Possible to go from 100 Gigabytes to
2 × N × 300 = 0.6 Gigabytes
15 / 64
How much compression can we get?
The Matrix Sparse Representation
It Achieves 90% Compression - We go from 100 Gigabytes to 10
Gigabytes
From 50,000 dimensions/words we go to 300 dimensions
Using the Singular Value Decomposition
Making Possible to go from 100 Gigabytes to
2 × N × 300 = 0.6 Gigabytes
15 / 64
How much compression can we get?
The Matrix Sparse Representation
It Achieves 90% Compression - We go from 100 Gigabytes to 10
Gigabytes
From 50,000 dimensions/words we go to 300 dimensions
Using the Singular Value Decomposition
Making Possible to go from 100 Gigabytes to
2 × N × 300 = 0.6 Gigabytes
15 / 64
IMAGINE!!!!
We have a crazy moment!!!
All the stars we steal from the night sky
Will never be enough
Never be enough
Towers of gold are still too little
These hands could hold the world
but it’ll
Never be enough
Never be enough
For me
16 / 64
Then
You go ambitious!!! You add a new dimension representing feelings!!!
Feeling
Dim
ensionality
17 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
18 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
Then
Gregorio Ricci-Curbastro and Tullio Levi-Civita
They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) ,
entitled "Méthodes de calcul differéntiel absolu"
A Monster Came Around
20 / 64
Then
Gregorio Ricci-Curbastro and Tullio Levi-Civita
They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) ,
entitled "Méthodes de calcul differéntiel absolu"
A Monster Came Around
20 / 64
“Every Genius has stood in the Shoulder of Giants” -
Newton
Einstein adopted the concepts at the paper
And the Theory of General Relativity was born
He renamed the entire field from “calcul absolu”
TENSOR CALCULUS
21 / 64
“Every Genius has stood in the Shoulder of Giants” -
Newton
Einstein adopted the concepts at the paper
And the Theory of General Relativity was born
He renamed the entire field from “calcul absolu”
TENSOR CALCULUS
21 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
22 / 64
First Principles...
Imagine a linear coordinate system
23 / 64
We define
A Coordinate System
We define vectors in terms of a base
v = vxe1 + vye2 =
vx
vy
∈ R2
v = v 2 = v2
x + v2
y
1
2
Note: This is important vectors are always the same thing no
matter the coordinate thing
24 / 64
Therefore
Imagine to represent the new basis in terms of an old basis
e1 · v = vx = e1 · vxe1 + e1 · vye2
e2 · v = vy = e2 · vxe1 + e2 · vye2
Where
ei · ej = Projection of ei onto ej
25 / 64
Therefore
Imagine to represent the new basis in terms of an old basis
e1 · v = vx = e1 · vxe1 + e1 · vye2
e2 · v = vy = e2 · vxe1 + e2 · vye2
Where
ei · ej = Projection of ei onto ej
25 / 64
Using a Little bit of Notation
We need a notation that is both more compact
Let the indices i, j represent the numbers 1, 2 corresponding to the
coordinates x, y
Write components of v as vi and v i in the two coordinate system
Then define
aij
= ei · ej
Note: This define the “ROTATION”
In fact are individually just the cosines of the angle
between one axis and another
26 / 64
Using a Little bit of Notation
We need a notation that is both more compact
Let the indices i, j represent the numbers 1, 2 corresponding to the
coordinates x, y
Write components of v as vi and v i in the two coordinate system
Then define
aij
= ei · ej
Note: This define the “ROTATION”
In fact are individually just the cosines of the angle
between one axis and another
26 / 64
Therefore
We can rewrite the entire transformation
v i
=
2
j=1
aij
vj
We will agree that whenever an index appears twice, we have a sum
v i
= aij
vj
27 / 64
Therefore
We can rewrite the entire transformation
v i
=
2
j=1
aij
vj
We will agree that whenever an index appears twice, we have a sum
v i
= aij
vj
27 / 64
We have then...
We can do the following
v 1
v 2 =
a11 a12
a21 a22
v1
v2
Then, we compress our notation more
v = av
28 / 64
We have then...
We can do the following
v 1
v 2 =
a11 a12
a21 a22
v1
v2
Then, we compress our notation more
v = av
28 / 64
Then, we can redefine our dot product
The Basis of Projecting into other vectors
v · w = vi
wi
= v
i
w
i
= aij
aik
vj
wk
Using the Kronecker Delta
δij
=
0 if i = j
1 if i = j
Therefore, we have
aij
aik
= δjk
29 / 64
Then, we can redefine our dot product
The Basis of Projecting into other vectors
v · w = vi
wi
= v
i
w
i
= aij
aik
vj
wk
Using the Kronecker Delta
δij
=
0 if i = j
1 if i = j
Therefore, we have
aij
aik
= δjk
29 / 64
Then, we can redefine our dot product
The Basis of Projecting into other vectors
v · w = vi
wi
= v
i
w
i
= aij
aik
vj
wk
Using the Kronecker Delta
δij
=
0 if i = j
1 if i = j
Therefore, we have
aij
aik
= δjk
29 / 64
Proving the Invariance of the dot product
Therefore
v
i
· w
i
= δjk
vj
wk
= vj
· wj
30 / 64
Then, we have
A scalar is a number K
It has the same value in different coordinate systems.
A vector is a set of numbers vi
They Transform according to
v i
= aij
vj
A (Second Rank) Tensor is a set of numbers Tij
They transform according to
T ij
= aik
ajl
Tkl
31 / 64
Then, we have
A scalar is a number K
It has the same value in different coordinate systems.
A vector is a set of numbers vi
They Transform according to
v i
= aij
vj
A (Second Rank) Tensor is a set of numbers Tij
They transform according to
T ij
= aik
ajl
Tkl
31 / 64
Then, we have
A scalar is a number K
It has the same value in different coordinate systems.
A vector is a set of numbers vi
They Transform according to
v i
= aij
vj
A (Second Rank) Tensor is a set of numbers Tij
They transform according to
T ij
= aik
ajl
Tkl
31 / 64
Then you can go higher
For Example, tensors in Rank 3
32 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
33 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
Decomposition of Tensors
Hitchcock Proposed such decomposition first... then the deluge
Name Proposed by
Polyadic form of a tensor Hitchcock, 1927
Three-mode Tucker 1966
factor analysis
PARAFAC (parallel factors) Harshman, 1970
CANDECOMP or CAND Carroll and Chang, 1970
(canonical decomposition)
Topographic components Möcks, 1988
model
CP (CANDECOMP/PARAFAC) Kiers, 2000
36 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
37 / 64
Look at the most modern on, 17 years ago...
The CP decomposition factorizes a tensor into a sum of component
rank-one tensors (Vectors!!!)
X ≈
R
r=1
ar ◦ br ◦ cr with X ∈ RI×J×K
Where
R is a positive integer
ar ∈ RI
br
∈ RJ
cr ∈ RK
38 / 64
Look at the most modern on, 17 years ago...
The CP decomposition factorizes a tensor into a sum of component
rank-one tensors (Vectors!!!)
X ≈
R
r=1
ar ◦ br ◦ cr with X ∈ RI×J×K
Where
R is a positive integer
ar ∈ RI
br
∈ RJ
cr ∈ RK
38 / 64
Then, Point Wise
We have the following
xijk =
R
r=1
airbjrccr
Graphically
39 / 64
Then, Point Wise
We have the following
xijk =
R
r=1
airbjrccr
Graphically
39 / 64
Therefore
The rank of a tensor X, rank(X)
It is defined as the smallest number of rank-one tensors that generate X
as their sum!!!
Problem!!!
The problem is NP-hard
But that has not stopped us because
We can use many of the methods in optimization to try to figure out
the magical number R!!!
From Approximation Techniques...
To Branch and Bound...
Even Naive techniques...
40 / 64
Therefore
The rank of a tensor X, rank(X)
It is defined as the smallest number of rank-one tensors that generate X
as their sum!!!
Problem!!!
The problem is NP-hard
But that has not stopped us because
We can use many of the methods in optimization to try to figure out
the magical number R!!!
From Approximation Techniques...
To Branch and Bound...
Even Naive techniques...
40 / 64
Therefore
The rank of a tensor X, rank(X)
It is defined as the smallest number of rank-one tensors that generate X
as their sum!!!
Problem!!!
The problem is NP-hard
But that has not stopped us because
We can use many of the methods in optimization to try to figure out
the magical number R!!!
From Approximation Techniques...
To Branch and Bound...
Even Naive techniques...
40 / 64
Why so much effort?
A Big Difference with SVD
It is never unique unless we have a orthogonality between the columns or
rows in the matrix.
We have then
That Tensors are way more general and less prone to problems!!!
41 / 64
Why so much effort?
A Big Difference with SVD
It is never unique unless we have a orthogonality between the columns or
rows in the matrix.
We have then
That Tensors are way more general and less prone to problems!!!
41 / 64
Now
We introduce a little bit of more notation
X ≈
R
r=1
ar ◦ br ◦ cr = A, B, C
CP Decompose the Tensor using the following Optimization
min
X
X − X
s.t. X =
R
r=1
λar ◦ br ◦ cr = λ; A, B, C
42 / 64
Now
We introduce a little bit of more notation
X ≈
R
r=1
ar ◦ br ◦ cr = A, B, C
CP Decompose the Tensor using the following Optimization
min
X
X − X
s.t. X =
R
r=1
λar ◦ br ◦ cr = λ; A, B, C
42 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
43 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Even in Machines like
a Cray XC30 super- computer
5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute
nodes.
The peak flop rate of each core is 19.2 GFLOPS.
Each node has 64 GB of memory.
This machines will go down
Because the data representation is not efficient...
45 / 64
Even in Machines like
a Cray XC30 super- computer
5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute
nodes.
The peak flop rate of each core is 19.2 GFLOPS.
Each node has 64 GB of memory.
This machines will go down
Because the data representation is not efficient...
45 / 64
Using the Tucker Decomposition
46 / 64
Furthermore...
We have that for 550 Gigabytes compression’s as
1 5 Times 100 Gigs
2 16 Times 34 Gigs
3 55 Times 10 Gig
4 etc
Improving Running times like crazy... from 3 seconds to 70 seconds
when processing 15 TB of data...
47 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
48 / 64
We have a huge problem in Deep Neural Networks
Modern Architectures
They are consuming from 89% to 100% of the memory at host GPU and
Machines
Depending on the place the calculations are done!!!
49 / 64
Problem with such Architectures
Recent studies show
The weight matrix of the fully-connected layer is highly redundant.
if you reduce the number of parameters, you could achieve
A similar predictive power
Possible making them less prone to over-fitting or under-fitting
50 / 64
Problem with such Architectures
Recent studies show
The weight matrix of the fully-connected layer is highly redundant.
if you reduce the number of parameters, you could achieve
A similar predictive power
Possible making them less prone to over-fitting or under-fitting
50 / 64
Thus
In the Paper
Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015.
Tensorizing neural networks. In Advances in Neural Information
Processing Systems (pp. 442-450).
They Proposed the TT-Representation
Where in a d−dimensional array (Tensor) A
If for a each dimension k = 1, ..., d and each possible value of the kth
dimension index jk = 1, ..., nk
There exists a matrix Gk [jk] such that all the elements of A can be
computed as a product of matrices.
51 / 64
Thus
In the Paper
Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015.
Tensorizing neural networks. In Advances in Neural Information
Processing Systems (pp. 442-450).
They Proposed the TT-Representation
Where in a d−dimensional array (Tensor) A
If for a each dimension k = 1, ..., d and each possible value of the kth
dimension index jk = 1, ..., nk
There exists a matrix Gk [jk] such that all the elements of A can be
computed as a product of matrices.
51 / 64
Thus
In the Paper
Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015.
Tensorizing neural networks. In Advances in Neural Information
Processing Systems (pp. 442-450).
They Proposed the TT-Representation
Where in a d−dimensional array (Tensor) A
If for a each dimension k = 1, ..., d and each possible value of the kth
dimension index jk = 1, ..., nk
There exists a matrix Gk [jk] such that all the elements of A can be
computed as a product of matrices.
51 / 64
Then
The TT-Representation
A (j1, j2 · · · , jd) = G1 [j1] G2 [j2] · · · Gd [jd]
All matrices Gk [jk] related to the same dimension k are restricted to
be of the same size rk−1 × rk.
52 / 64
Here a problem, we do not have a unique representation
We then go for the lowest rank
A (j1, j2 · · · , jd) =
α0,...,αd
G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd)
Where
Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position
(α0, α1)
53 / 64
Here a problem, we do not have a unique representation
We then go for the lowest rank
A (j1, j2 · · · , jd) =
α0,...,αd
G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd)
Where
Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position
(α0, α1)
53 / 64
With Memory Usage
For full representation
d
k=1
nk
and the TT-Representation
d
k=1
nkrk−1rk
54 / 64
With Memory Usage
For full representation
d
k=1
nk
and the TT-Representation
d
k=1
nkrk−1rk
54 / 64
Then
They propose to store each layer in a TT-Representation W
Where W are the weight of a fully connected layer
Then, using our old back-propagation
y = Wx + b
With W ∈ RN×M and b ∈ RM
In TT-Representation
Y (i1, i2 · · · , id) =
j1,...,jd
G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id)
55 / 64
Then
They propose to store each layer in a TT-Representation W
Where W are the weight of a fully connected layer
Then, using our old back-propagation
y = Wx + b
With W ∈ RN×M and b ∈ RM
In TT-Representation
Y (i1, i2 · · · , id) =
j1,...,jd
G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id)
55 / 64
Then
They propose to store each layer in a TT-Representation W
Where W are the weight of a fully connected layer
Then, using our old back-propagation
y = Wx + b
With W ∈ RN×M and b ∈ RM
In TT-Representation
Y (i1, i2 · · · , id) =
j1,...,jd
G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id)
55 / 64
This has the following complexity
The previous representation allows to handle a larger number of
parameters
Without too much overhead...
With the following complexities
Operation Time Memory
FC forward pass O(MN) O(MN)
TT forward pass O dr2m max {M, N} O dr2 max {M, N}
FC backward pass O(MN) O(MN)
TT backward pass O dr2m max {M, N} O dr3 max {M, N}
56 / 64
This has the following complexity
The previous representation allows to handle a larger number of
parameters
Without too much overhead...
With the following complexities
Operation Time Memory
FC forward pass O(MN) O(MN)
TT forward pass O dr2m max {M, N} O dr2 max {M, N}
FC backward pass O(MN) O(MN)
TT backward pass O dr2m max {M, N} O dr3 max {M, N}
56 / 64
Applications for this
Manage Better
The amount of memory being used in the devices
Increase the size of the Deep Networks
Although I have some thoughts about this...
Implement CNN Networks into mobile devices
Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu
Yang, and Dongjun Shin. "Compression of deep convolutional neural
networks for fast and low power mobile applications." arXiv preprint
arXiv:1511.06530 (2015).
57 / 64
Applications for this
Manage Better
The amount of memory being used in the devices
Increase the size of the Deep Networks
Although I have some thoughts about this...
Implement CNN Networks into mobile devices
Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu
Yang, and Dongjun Shin. "Compression of deep convolutional neural
networks for fast and low power mobile applications." arXiv preprint
arXiv:1511.06530 (2015).
57 / 64
Applications for this
Manage Better
The amount of memory being used in the devices
Increase the size of the Deep Networks
Although I have some thoughts about this...
Implement CNN Networks into mobile devices
Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu
Yang, and Dongjun Shin. "Compression of deep convolutional neural
networks for fast and low power mobile applications." arXiv preprint
arXiv:1511.06530 (2015).
57 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
58 / 64
Given that
Something Notable
Sparse tensors appear in many large-scale applications with
multidimensional and sparse data.
What support do we have for such situations?
Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri
Dehnavi. "A Unified Optimization Approach for Sparse Tensor
Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017).
59 / 64
Given that
Something Notable
Sparse tensors appear in many large-scale applications with
multidimensional and sparse data.
What support do we have for such situations?
Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri
Dehnavi. "A Unified Optimization Approach for Sparse Tensor
Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017).
59 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
62 / 64
As Always
We need people able to dream these new ways of doing stuff...
Therefore, a series of pieces of advise...
Learn more than a simple framework...
Learn the mathematics
And more importantly
Learn how to Model the Reality using such
Mathematical Tools...
63 / 64
As Always
We need people able to dream these new ways of doing stuff...
Therefore, a series of pieces of advise...
Learn more than a simple framework...
Learn the mathematics
And more importantly
Learn how to Model the Reality using such
Mathematical Tools...
63 / 64
As Always
We need people able to dream these new ways of doing stuff...
Therefore, a series of pieces of advise...
Learn more than a simple framework...
Learn the mathematics
And more importantly
Learn how to Model the Reality using such
Mathematical Tools...
63 / 64
Thanks
Any Questions?
I repeat I am not an expert in Tensor Calculus....
64 / 64

More Related Content

Similar to Tensor models and other dreams by PhD Andres Mendez-Vazquez

Fractals
FractalsFractals
Fractals
guestc5cd98e
 
Parity arguments in problem solving
Parity arguments in problem solvingParity arguments in problem solving
Parity arguments in problem solving
talegari
 
Graph theory
Graph theoryGraph theory
Graph theory
Kumar
 
CRMS Calculus May 31, 2010
CRMS Calculus May 31, 2010CRMS Calculus May 31, 2010
CRMS Calculus May 31, 2010
Fountain Valley School of Colorado
 
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONSTRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONS
BRNSS Publication Hub
 
A guide for teachers – Years 11 and 121 23
A guide for teachers – Years 11 and 121  23 A guide for teachers – Years 11 and 121  23
A guide for teachers – Years 11 and 121 23
mecklenburgstrelitzh
 
A guide for teachers – Years 11 and 121 23 .docx
A guide for teachers – Years 11 and 121  23 .docxA guide for teachers – Years 11 and 121  23 .docx
A guide for teachers – Years 11 and 121 23 .docx
makdul
 
1.3 Pythagorean Theorem
1.3 Pythagorean Theorem1.3 Pythagorean Theorem
1.3 Pythagorean Theorem
smiller5
 
Gd 26
Gd 26Gd 26
6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii x6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii x
math266
 
Archimedes
ArchimedesArchimedes
Archimedes
akshay prabha
 
Large Deviations: An Introduction
Large Deviations: An IntroductionLarge Deviations: An Introduction
Large Deviations: An Introduction
Horacio González Duhart
 
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHIFRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
MILANJOSHIJI
 
HIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATUREHIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATURE
Milan Joshi
 
Hidden dimensions in nature
Hidden dimensions in natureHidden dimensions in nature
Hidden dimensions in nature
Milan Joshi
 
hidden dimension in nature
hidden dimension in naturehidden dimension in nature
hidden dimension in nature
Milan Joshi
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
Christian Robert
 
An FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge ColoringAn FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge Coloring
Neeldhara Misra
 
Dependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural LanguageDependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural Language
Daisuke BEKKI
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation Axioms
Karel Ha
 

Similar to Tensor models and other dreams by PhD Andres Mendez-Vazquez (20)

Fractals
FractalsFractals
Fractals
 
Parity arguments in problem solving
Parity arguments in problem solvingParity arguments in problem solving
Parity arguments in problem solving
 
Graph theory
Graph theoryGraph theory
Graph theory
 
CRMS Calculus May 31, 2010
CRMS Calculus May 31, 2010CRMS Calculus May 31, 2010
CRMS Calculus May 31, 2010
 
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONSTRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONS
 
A guide for teachers – Years 11 and 121 23
A guide for teachers – Years 11 and 121  23 A guide for teachers – Years 11 and 121  23
A guide for teachers – Years 11 and 121 23
 
A guide for teachers – Years 11 and 121 23 .docx
A guide for teachers – Years 11 and 121  23 .docxA guide for teachers – Years 11 and 121  23 .docx
A guide for teachers – Years 11 and 121 23 .docx
 
1.3 Pythagorean Theorem
1.3 Pythagorean Theorem1.3 Pythagorean Theorem
1.3 Pythagorean Theorem
 
Gd 26
Gd 26Gd 26
Gd 26
 
6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii x6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii x
 
Archimedes
ArchimedesArchimedes
Archimedes
 
Large Deviations: An Introduction
Large Deviations: An IntroductionLarge Deviations: An Introduction
Large Deviations: An Introduction
 
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHIFRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
 
HIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATUREHIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATURE
 
Hidden dimensions in nature
Hidden dimensions in natureHidden dimensions in nature
Hidden dimensions in nature
 
hidden dimension in nature
hidden dimension in naturehidden dimension in nature
hidden dimension in nature
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
An FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge ColoringAn FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge Coloring
 
Dependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural LanguageDependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural Language
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation Axioms
 

More from DataLab Community

Meetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos GenéticosMeetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos Genéticos
DataLab Community
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
DataLab Community
 
Meetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark FundamentalsMeetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark Fundamentals
DataLab Community
 
Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...
DataLab Community
 
Metodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña LlamasMetodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
DataLab Community
 
Curse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando GarciaCurse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando Garcia
DataLab Community
 
Quiénes somos - DataLab Community
Quiénes somos - DataLab CommunityQuiénes somos - DataLab Community
Quiénes somos - DataLab Community
DataLab Community
 
Profesiones de la ciencia de datos
Profesiones de la ciencia de datosProfesiones de la ciencia de datos
Profesiones de la ciencia de datos
DataLab Community
 
El arte de la Ciencia de Datos
El arte de la Ciencia de DatosEl arte de la Ciencia de Datos
El arte de la Ciencia de Datos
DataLab Community
 
Presentación de DataLab Community
Presentación de DataLab CommunityPresentación de DataLab Community
Presentación de DataLab Community
DataLab Community
 
De qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data ScienceDe qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data Science
DataLab Community
 

More from DataLab Community (11)

Meetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos GenéticosMeetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos Genéticos
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
Meetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark FundamentalsMeetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark Fundamentals
 
Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...
 
Metodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña LlamasMetodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
 
Curse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando GarciaCurse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando Garcia
 
Quiénes somos - DataLab Community
Quiénes somos - DataLab CommunityQuiénes somos - DataLab Community
Quiénes somos - DataLab Community
 
Profesiones de la ciencia de datos
Profesiones de la ciencia de datosProfesiones de la ciencia de datos
Profesiones de la ciencia de datos
 
El arte de la Ciencia de Datos
El arte de la Ciencia de DatosEl arte de la Ciencia de Datos
El arte de la Ciencia de Datos
 
Presentación de DataLab Community
Presentación de DataLab CommunityPresentación de DataLab Community
Presentación de DataLab Community
 
De qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data ScienceDe qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data Science
 

Recently uploaded

Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 

Recently uploaded (20)

Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 

Tensor models and other dreams by PhD Andres Mendez-Vazquez

  • 1. Tensor Models and Other Dreams... Andres Mendez-Vazquez January 26, 2018 1 / 64
  • 2. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 2 / 64
  • 3. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 3 / 64
  • 4. Tensors are this way... As words defining an important moment in life Without you All the stars we steal from the night sky Will never be enough Never be enough These hands could hold the world but it’ll Never be enough... - Justin Paul / Benj Pasek, Greatest Showman 4 / 64
  • 5. Tensors are like such words... They represent generalizations that represent our dreams... In Data Sciences... 5 / 64
  • 6. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 6 / 64
  • 7. Document Representation Imagine the following... You have a bunch of documents... They are hundred thousands of them... 7 / 64
  • 8. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 9. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 10. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 11. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 12. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 13. The Matrix at the Center of Everything!!! The Vector/Matrix Representation They are basically a N × d matrix like this A =          (x1)1 · · · (x1)j · · · (x1)d ... ... (xi)1 (xi)j (xi)d ... ... (xN )1 · · · (xN )j · · · (xN )d          A is a matrix with... N represents the thousands of documents... d represents the thousands of words in a dictionary..... 9 / 64
  • 14. The Matrix at the Center of Everything!!! The Vector/Matrix Representation They are basically a N × d matrix like this A =          (x1)1 · · · (x1)j · · · (x1)d ... ... (xi)1 (xi)j (xi)d ... ... (xN )1 · · · (xN )j · · · (xN )d          A is a matrix with... N represents the thousands of documents... d represents the thousands of words in a dictionary..... 9 / 64
  • 15. A Small Problem The matrix alone consumes... so much... You have 2 bytes per memory cell If we have N = 106 , d = 50, 000 We have 2 × N × d = 100 Gigabytes 10 / 64
  • 16. A Small Problem The matrix alone consumes... so much... You have 2 bytes per memory cell If we have N = 106 , d = 50, 000 We have 2 × N × d = 100 Gigabytes 10 / 64
  • 17. Danger!!! Will Robinson Lost in Space 11 / 64
  • 18. We have a trick!!! Something Notable The Matrix is Highly SPARSE 12 / 64
  • 19. Therefore If you are smart enough You start represent the matrix information using sparse techniques 5x5 Matrix Numeric Elements Empty Elements Sparse Matrix 13 / 64
  • 20. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 21. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 22. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 23. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 24. How much compression can we get? The Matrix Sparse Representation It Achieves 90% Compression - We go from 100 Gigabytes to 10 Gigabytes From 50,000 dimensions/words we go to 300 dimensions Using the Singular Value Decomposition Making Possible to go from 100 Gigabytes to 2 × N × 300 = 0.6 Gigabytes 15 / 64
  • 25. How much compression can we get? The Matrix Sparse Representation It Achieves 90% Compression - We go from 100 Gigabytes to 10 Gigabytes From 50,000 dimensions/words we go to 300 dimensions Using the Singular Value Decomposition Making Possible to go from 100 Gigabytes to 2 × N × 300 = 0.6 Gigabytes 15 / 64
  • 26. How much compression can we get? The Matrix Sparse Representation It Achieves 90% Compression - We go from 100 Gigabytes to 10 Gigabytes From 50,000 dimensions/words we go to 300 dimensions Using the Singular Value Decomposition Making Possible to go from 100 Gigabytes to 2 × N × 300 = 0.6 Gigabytes 15 / 64
  • 27. IMAGINE!!!! We have a crazy moment!!! All the stars we steal from the night sky Will never be enough Never be enough Towers of gold are still too little These hands could hold the world but it’ll Never be enough Never be enough For me 16 / 64
  • 28. Then You go ambitious!!! You add a new dimension representing feelings!!! Feeling Dim ensionality 17 / 64
  • 29. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 18 / 64
  • 30. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 31. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 32. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 33. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 34. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 35. Then Gregorio Ricci-Curbastro and Tullio Levi-Civita They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) , entitled "Méthodes de calcul differéntiel absolu" A Monster Came Around 20 / 64
  • 36. Then Gregorio Ricci-Curbastro and Tullio Levi-Civita They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) , entitled "Méthodes de calcul differéntiel absolu" A Monster Came Around 20 / 64
  • 37. “Every Genius has stood in the Shoulder of Giants” - Newton Einstein adopted the concepts at the paper And the Theory of General Relativity was born He renamed the entire field from “calcul absolu” TENSOR CALCULUS 21 / 64
  • 38. “Every Genius has stood in the Shoulder of Giants” - Newton Einstein adopted the concepts at the paper And the Theory of General Relativity was born He renamed the entire field from “calcul absolu” TENSOR CALCULUS 21 / 64
  • 39. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 22 / 64
  • 40. First Principles... Imagine a linear coordinate system 23 / 64
  • 41. We define A Coordinate System We define vectors in terms of a base v = vxe1 + vye2 = vx vy ∈ R2 v = v 2 = v2 x + v2 y 1 2 Note: This is important vectors are always the same thing no matter the coordinate thing 24 / 64
  • 42. Therefore Imagine to represent the new basis in terms of an old basis e1 · v = vx = e1 · vxe1 + e1 · vye2 e2 · v = vy = e2 · vxe1 + e2 · vye2 Where ei · ej = Projection of ei onto ej 25 / 64
  • 43. Therefore Imagine to represent the new basis in terms of an old basis e1 · v = vx = e1 · vxe1 + e1 · vye2 e2 · v = vy = e2 · vxe1 + e2 · vye2 Where ei · ej = Projection of ei onto ej 25 / 64
  • 44. Using a Little bit of Notation We need a notation that is both more compact Let the indices i, j represent the numbers 1, 2 corresponding to the coordinates x, y Write components of v as vi and v i in the two coordinate system Then define aij = ei · ej Note: This define the “ROTATION” In fact are individually just the cosines of the angle between one axis and another 26 / 64
  • 45. Using a Little bit of Notation We need a notation that is both more compact Let the indices i, j represent the numbers 1, 2 corresponding to the coordinates x, y Write components of v as vi and v i in the two coordinate system Then define aij = ei · ej Note: This define the “ROTATION” In fact are individually just the cosines of the angle between one axis and another 26 / 64
  • 46. Therefore We can rewrite the entire transformation v i = 2 j=1 aij vj We will agree that whenever an index appears twice, we have a sum v i = aij vj 27 / 64
  • 47. Therefore We can rewrite the entire transformation v i = 2 j=1 aij vj We will agree that whenever an index appears twice, we have a sum v i = aij vj 27 / 64
  • 48. We have then... We can do the following v 1 v 2 = a11 a12 a21 a22 v1 v2 Then, we compress our notation more v = av 28 / 64
  • 49. We have then... We can do the following v 1 v 2 = a11 a12 a21 a22 v1 v2 Then, we compress our notation more v = av 28 / 64
  • 50. Then, we can redefine our dot product The Basis of Projecting into other vectors v · w = vi wi = v i w i = aij aik vj wk Using the Kronecker Delta δij = 0 if i = j 1 if i = j Therefore, we have aij aik = δjk 29 / 64
  • 51. Then, we can redefine our dot product The Basis of Projecting into other vectors v · w = vi wi = v i w i = aij aik vj wk Using the Kronecker Delta δij = 0 if i = j 1 if i = j Therefore, we have aij aik = δjk 29 / 64
  • 52. Then, we can redefine our dot product The Basis of Projecting into other vectors v · w = vi wi = v i w i = aij aik vj wk Using the Kronecker Delta δij = 0 if i = j 1 if i = j Therefore, we have aij aik = δjk 29 / 64
  • 53. Proving the Invariance of the dot product Therefore v i · w i = δjk vj wk = vj · wj 30 / 64
  • 54. Then, we have A scalar is a number K It has the same value in different coordinate systems. A vector is a set of numbers vi They Transform according to v i = aij vj A (Second Rank) Tensor is a set of numbers Tij They transform according to T ij = aik ajl Tkl 31 / 64
  • 55. Then, we have A scalar is a number K It has the same value in different coordinate systems. A vector is a set of numbers vi They Transform according to v i = aij vj A (Second Rank) Tensor is a set of numbers Tij They transform according to T ij = aik ajl Tkl 31 / 64
  • 56. Then, we have A scalar is a number K It has the same value in different coordinate systems. A vector is a set of numbers vi They Transform according to v i = aij vj A (Second Rank) Tensor is a set of numbers Tij They transform according to T ij = aik ajl Tkl 31 / 64
  • 57. Then you can go higher For Example, tensors in Rank 3 32 / 64
  • 58. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 33 / 64
  • 59. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 60. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 61. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 62. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 63. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 64. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 65. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 66. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 67. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 68. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 69. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 70. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 71. Decomposition of Tensors Hitchcock Proposed such decomposition first... then the deluge Name Proposed by Polyadic form of a tensor Hitchcock, 1927 Three-mode Tucker 1966 factor analysis PARAFAC (parallel factors) Harshman, 1970 CANDECOMP or CAND Carroll and Chang, 1970 (canonical decomposition) Topographic components Möcks, 1988 model CP (CANDECOMP/PARAFAC) Kiers, 2000 36 / 64
  • 72. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 37 / 64
  • 73. Look at the most modern on, 17 years ago... The CP decomposition factorizes a tensor into a sum of component rank-one tensors (Vectors!!!) X ≈ R r=1 ar ◦ br ◦ cr with X ∈ RI×J×K Where R is a positive integer ar ∈ RI br ∈ RJ cr ∈ RK 38 / 64
  • 74. Look at the most modern on, 17 years ago... The CP decomposition factorizes a tensor into a sum of component rank-one tensors (Vectors!!!) X ≈ R r=1 ar ◦ br ◦ cr with X ∈ RI×J×K Where R is a positive integer ar ∈ RI br ∈ RJ cr ∈ RK 38 / 64
  • 75. Then, Point Wise We have the following xijk = R r=1 airbjrccr Graphically 39 / 64
  • 76. Then, Point Wise We have the following xijk = R r=1 airbjrccr Graphically 39 / 64
  • 77. Therefore The rank of a tensor X, rank(X) It is defined as the smallest number of rank-one tensors that generate X as their sum!!! Problem!!! The problem is NP-hard But that has not stopped us because We can use many of the methods in optimization to try to figure out the magical number R!!! From Approximation Techniques... To Branch and Bound... Even Naive techniques... 40 / 64
  • 78. Therefore The rank of a tensor X, rank(X) It is defined as the smallest number of rank-one tensors that generate X as their sum!!! Problem!!! The problem is NP-hard But that has not stopped us because We can use many of the methods in optimization to try to figure out the magical number R!!! From Approximation Techniques... To Branch and Bound... Even Naive techniques... 40 / 64
  • 79. Therefore The rank of a tensor X, rank(X) It is defined as the smallest number of rank-one tensors that generate X as their sum!!! Problem!!! The problem is NP-hard But that has not stopped us because We can use many of the methods in optimization to try to figure out the magical number R!!! From Approximation Techniques... To Branch and Bound... Even Naive techniques... 40 / 64
  • 80. Why so much effort? A Big Difference with SVD It is never unique unless we have a orthogonality between the columns or rows in the matrix. We have then That Tensors are way more general and less prone to problems!!! 41 / 64
  • 81. Why so much effort? A Big Difference with SVD It is never unique unless we have a orthogonality between the columns or rows in the matrix. We have then That Tensors are way more general and less prone to problems!!! 41 / 64
  • 82. Now We introduce a little bit of more notation X ≈ R r=1 ar ◦ br ◦ cr = A, B, C CP Decompose the Tensor using the following Optimization min X X − X s.t. X = R r=1 λar ◦ br ◦ cr = λ; A, B, C 42 / 64
  • 83. Now We introduce a little bit of more notation X ≈ R r=1 ar ◦ br ◦ cr = A, B, C CP Decompose the Tensor using the following Optimization min X X − X s.t. X = R r=1 λar ◦ br ◦ cr = λ; A, B, C 42 / 64
  • 84. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 43 / 64
  • 85. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 86. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 87. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 88. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 89. Even in Machines like a Cray XC30 super- computer 5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute nodes. The peak flop rate of each core is 19.2 GFLOPS. Each node has 64 GB of memory. This machines will go down Because the data representation is not efficient... 45 / 64
  • 90. Even in Machines like a Cray XC30 super- computer 5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute nodes. The peak flop rate of each core is 19.2 GFLOPS. Each node has 64 GB of memory. This machines will go down Because the data representation is not efficient... 45 / 64
  • 91. Using the Tucker Decomposition 46 / 64
  • 92. Furthermore... We have that for 550 Gigabytes compression’s as 1 5 Times 100 Gigs 2 16 Times 34 Gigs 3 55 Times 10 Gig 4 etc Improving Running times like crazy... from 3 seconds to 70 seconds when processing 15 TB of data... 47 / 64
  • 93. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 48 / 64
  • 94. We have a huge problem in Deep Neural Networks Modern Architectures They are consuming from 89% to 100% of the memory at host GPU and Machines Depending on the place the calculations are done!!! 49 / 64
  • 95. Problem with such Architectures Recent studies show The weight matrix of the fully-connected layer is highly redundant. if you reduce the number of parameters, you could achieve A similar predictive power Possible making them less prone to over-fitting or under-fitting 50 / 64
  • 96. Problem with such Architectures Recent studies show The weight matrix of the fully-connected layer is highly redundant. if you reduce the number of parameters, you could achieve A similar predictive power Possible making them less prone to over-fitting or under-fitting 50 / 64
  • 97. Thus In the Paper Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (pp. 442-450). They Proposed the TT-Representation Where in a d−dimensional array (Tensor) A If for a each dimension k = 1, ..., d and each possible value of the kth dimension index jk = 1, ..., nk There exists a matrix Gk [jk] such that all the elements of A can be computed as a product of matrices. 51 / 64
  • 98. Thus In the Paper Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (pp. 442-450). They Proposed the TT-Representation Where in a d−dimensional array (Tensor) A If for a each dimension k = 1, ..., d and each possible value of the kth dimension index jk = 1, ..., nk There exists a matrix Gk [jk] such that all the elements of A can be computed as a product of matrices. 51 / 64
  • 99. Thus In the Paper Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (pp. 442-450). They Proposed the TT-Representation Where in a d−dimensional array (Tensor) A If for a each dimension k = 1, ..., d and each possible value of the kth dimension index jk = 1, ..., nk There exists a matrix Gk [jk] such that all the elements of A can be computed as a product of matrices. 51 / 64
  • 100. Then The TT-Representation A (j1, j2 · · · , jd) = G1 [j1] G2 [j2] · · · Gd [jd] All matrices Gk [jk] related to the same dimension k are restricted to be of the same size rk−1 × rk. 52 / 64
  • 101. Here a problem, we do not have a unique representation We then go for the lowest rank A (j1, j2 · · · , jd) = α0,...,αd G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd) Where Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position (α0, α1) 53 / 64
  • 102. Here a problem, we do not have a unique representation We then go for the lowest rank A (j1, j2 · · · , jd) = α0,...,αd G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd) Where Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position (α0, α1) 53 / 64
  • 103. With Memory Usage For full representation d k=1 nk and the TT-Representation d k=1 nkrk−1rk 54 / 64
  • 104. With Memory Usage For full representation d k=1 nk and the TT-Representation d k=1 nkrk−1rk 54 / 64
  • 105. Then They propose to store each layer in a TT-Representation W Where W are the weight of a fully connected layer Then, using our old back-propagation y = Wx + b With W ∈ RN×M and b ∈ RM In TT-Representation Y (i1, i2 · · · , id) = j1,...,jd G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id) 55 / 64
  • 106. Then They propose to store each layer in a TT-Representation W Where W are the weight of a fully connected layer Then, using our old back-propagation y = Wx + b With W ∈ RN×M and b ∈ RM In TT-Representation Y (i1, i2 · · · , id) = j1,...,jd G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id) 55 / 64
  • 107. Then They propose to store each layer in a TT-Representation W Where W are the weight of a fully connected layer Then, using our old back-propagation y = Wx + b With W ∈ RN×M and b ∈ RM In TT-Representation Y (i1, i2 · · · , id) = j1,...,jd G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id) 55 / 64
  • 108. This has the following complexity The previous representation allows to handle a larger number of parameters Without too much overhead... With the following complexities Operation Time Memory FC forward pass O(MN) O(MN) TT forward pass O dr2m max {M, N} O dr2 max {M, N} FC backward pass O(MN) O(MN) TT backward pass O dr2m max {M, N} O dr3 max {M, N} 56 / 64
  • 109. This has the following complexity The previous representation allows to handle a larger number of parameters Without too much overhead... With the following complexities Operation Time Memory FC forward pass O(MN) O(MN) TT forward pass O dr2m max {M, N} O dr2 max {M, N} FC backward pass O(MN) O(MN) TT backward pass O dr2m max {M, N} O dr3 max {M, N} 56 / 64
  • 110. Applications for this Manage Better The amount of memory being used in the devices Increase the size of the Deep Networks Although I have some thoughts about this... Implement CNN Networks into mobile devices Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. "Compression of deep convolutional neural networks for fast and low power mobile applications." arXiv preprint arXiv:1511.06530 (2015). 57 / 64
  • 111. Applications for this Manage Better The amount of memory being used in the devices Increase the size of the Deep Networks Although I have some thoughts about this... Implement CNN Networks into mobile devices Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. "Compression of deep convolutional neural networks for fast and low power mobile applications." arXiv preprint arXiv:1511.06530 (2015). 57 / 64
  • 112. Applications for this Manage Better The amount of memory being used in the devices Increase the size of the Deep Networks Although I have some thoughts about this... Implement CNN Networks into mobile devices Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. "Compression of deep convolutional neural networks for fast and low power mobile applications." arXiv preprint arXiv:1511.06530 (2015). 57 / 64
  • 113. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 58 / 64
  • 114. Given that Something Notable Sparse tensors appear in many large-scale applications with multidimensional and sparse data. What support do we have for such situations? Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri Dehnavi. "A Unified Optimization Approach for Sparse Tensor Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017). 59 / 64
  • 115. Given that Something Notable Sparse tensors appear in many large-scale applications with multidimensional and sparse data. What support do we have for such situations? Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri Dehnavi. "A Unified Optimization Approach for Sparse Tensor Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017). 59 / 64
  • 116. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 117. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 118. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 119. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 120. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 121. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 122. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 123. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 124. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 125. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 126. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 62 / 64
  • 127. As Always We need people able to dream these new ways of doing stuff... Therefore, a series of pieces of advise... Learn more than a simple framework... Learn the mathematics And more importantly Learn how to Model the Reality using such Mathematical Tools... 63 / 64
  • 128. As Always We need people able to dream these new ways of doing stuff... Therefore, a series of pieces of advise... Learn more than a simple framework... Learn the mathematics And more importantly Learn how to Model the Reality using such Mathematical Tools... 63 / 64
  • 129. As Always We need people able to dream these new ways of doing stuff... Therefore, a series of pieces of advise... Learn more than a simple framework... Learn the mathematics And more importantly Learn how to Model the Reality using such Mathematical Tools... 63 / 64
  • 130. Thanks Any Questions? I repeat I am not an expert in Tensor Calculus.... 64 / 64