SlideShare a Scribd company logo
1 of 130
Download to read offline
Tensor Models and Other Dreams...
Andres Mendez-Vazquez
January 26, 2018
1 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
2 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
3 / 64
Tensors are this way...
As words defining an important moment in life
Without you
All the stars we steal from the night sky
Will never be enough
Never be enough
These hands could hold the world
but it’ll
Never be enough...
- Justin Paul / Benj Pasek, Greatest Showman
4 / 64
Tensors are like such words...
They represent generalizations that represent our dreams...
In Data Sciences...
5 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
6 / 64
Document Representation
Imagine the following...
You have a bunch of documents... They are hundred thousands of them...
7 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
Then, we have an Opportunity or a Terrible Problem
How do you represent them in a easy way to handle them?
After all we want to
Search them
Compare them
Rank them
What about using vectors?
word 1 word 2 word 3 word 4
· · ·
word d
counter counter counter counter counter
x1 x2 x3 x4 · · · xd
8 / 64
The Matrix at the Center of Everything!!!
The Vector/Matrix Representation
They are basically a N × d matrix like this
A =









(x1)1 · · · (x1)j · · · (x1)d
...
...
(xi)1 (xi)j (xi)d
...
...
(xN )1 · · · (xN )j · · · (xN )d









A is a matrix with...
N represents the thousands of documents...
d represents the thousands of words in a dictionary.....
9 / 64
The Matrix at the Center of Everything!!!
The Vector/Matrix Representation
They are basically a N × d matrix like this
A =









(x1)1 · · · (x1)j · · · (x1)d
...
...
(xi)1 (xi)j (xi)d
...
...
(xN )1 · · · (xN )j · · · (xN )d









A is a matrix with...
N represents the thousands of documents...
d represents the thousands of words in a dictionary.....
9 / 64
A Small Problem
The matrix alone consumes... so much...
You have 2 bytes per memory cell
If we have N = 106
, d = 50, 000
We have
2 × N × d = 100 Gigabytes
10 / 64
A Small Problem
The matrix alone consumes... so much...
You have 2 bytes per memory cell
If we have N = 106
, d = 50, 000
We have
2 × N × d = 100 Gigabytes
10 / 64
Danger!!! Will Robinson
Lost in Space
11 / 64
We have a trick!!!
Something Notable
The Matrix is Highly SPARSE
12 / 64
Therefore
If you are smart enough
You start represent the matrix information using sparse techniques
5x5 Matrix
Numeric Elements
Empty Elements
Sparse Matrix
13 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
Then
If you are quite smart....
You discover that few of the eigenvalues provide some information...
Every Matrix has a Singular Value Decomposition
A = UΣV T
The columns of U are an orthonormal basis for the column space.
The columns of V are an orthonormal basis for the row space.
The Σ is diagonal and the entries on its diagonal σi = Σii are positive
real numbers, called the singular values of A.
14 / 64
How much compression can we get?
The Matrix Sparse Representation
It Achieves 90% Compression - We go from 100 Gigabytes to 10
Gigabytes
From 50,000 dimensions/words we go to 300 dimensions
Using the Singular Value Decomposition
Making Possible to go from 100 Gigabytes to
2 × N × 300 = 0.6 Gigabytes
15 / 64
How much compression can we get?
The Matrix Sparse Representation
It Achieves 90% Compression - We go from 100 Gigabytes to 10
Gigabytes
From 50,000 dimensions/words we go to 300 dimensions
Using the Singular Value Decomposition
Making Possible to go from 100 Gigabytes to
2 × N × 300 = 0.6 Gigabytes
15 / 64
How much compression can we get?
The Matrix Sparse Representation
It Achieves 90% Compression - We go from 100 Gigabytes to 10
Gigabytes
From 50,000 dimensions/words we go to 300 dimensions
Using the Singular Value Decomposition
Making Possible to go from 100 Gigabytes to
2 × N × 300 = 0.6 Gigabytes
15 / 64
IMAGINE!!!!
We have a crazy moment!!!
All the stars we steal from the night sky
Will never be enough
Never be enough
Towers of gold are still too little
These hands could hold the world
but it’ll
Never be enough
Never be enough
For me
16 / 64
Then
You go ambitious!!! You add a new dimension representing feelings!!!
Feeling
Dim
ensionality
17 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
18 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
They have a somewhat short history!!!
First Most
They are abstract entities invariant under coordinate transformations.
They were mentioned first by Woldemar Wright in 1898
A German physicist, who taught at the Georg August University of
Göttingen.
He mentioned the tensors in a study about the physical properties of
crystals.
But Before That
The Great Riemann introduced the concept of topological manifold...
the beginning of the dream...
Through a quadratic linear element to study its properties...
ds2
= gijdxi
dxj
19 / 64
Then
Gregorio Ricci-Curbastro and Tullio Levi-Civita
They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) ,
entitled "Méthodes de calcul differéntiel absolu"
A Monster Came Around
20 / 64
Then
Gregorio Ricci-Curbastro and Tullio Levi-Civita
They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) ,
entitled "Méthodes de calcul differéntiel absolu"
A Monster Came Around
20 / 64
“Every Genius has stood in the Shoulder of Giants” -
Newton
Einstein adopted the concepts at the paper
And the Theory of General Relativity was born
He renamed the entire field from “calcul absolu”
TENSOR CALCULUS
21 / 64
“Every Genius has stood in the Shoulder of Giants” -
Newton
Einstein adopted the concepts at the paper
And the Theory of General Relativity was born
He renamed the entire field from “calcul absolu”
TENSOR CALCULUS
21 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
22 / 64
First Principles...
Imagine a linear coordinate system
23 / 64
We define
A Coordinate System
We define vectors in terms of a base
v = vxe1 + vye2 =
vx
vy
∈ R2
v = v 2 = v2
x + v2
y
1
2
Note: This is important vectors are always the same thing no
matter the coordinate thing
24 / 64
Therefore
Imagine to represent the new basis in terms of an old basis
e1 · v = vx = e1 · vxe1 + e1 · vye2
e2 · v = vy = e2 · vxe1 + e2 · vye2
Where
ei · ej = Projection of ei onto ej
25 / 64
Therefore
Imagine to represent the new basis in terms of an old basis
e1 · v = vx = e1 · vxe1 + e1 · vye2
e2 · v = vy = e2 · vxe1 + e2 · vye2
Where
ei · ej = Projection of ei onto ej
25 / 64
Using a Little bit of Notation
We need a notation that is both more compact
Let the indices i, j represent the numbers 1, 2 corresponding to the
coordinates x, y
Write components of v as vi and v i in the two coordinate system
Then define
aij
= ei · ej
Note: This define the “ROTATION”
In fact are individually just the cosines of the angle
between one axis and another
26 / 64
Using a Little bit of Notation
We need a notation that is both more compact
Let the indices i, j represent the numbers 1, 2 corresponding to the
coordinates x, y
Write components of v as vi and v i in the two coordinate system
Then define
aij
= ei · ej
Note: This define the “ROTATION”
In fact are individually just the cosines of the angle
between one axis and another
26 / 64
Therefore
We can rewrite the entire transformation
v i
=
2
j=1
aij
vj
We will agree that whenever an index appears twice, we have a sum
v i
= aij
vj
27 / 64
Therefore
We can rewrite the entire transformation
v i
=
2
j=1
aij
vj
We will agree that whenever an index appears twice, we have a sum
v i
= aij
vj
27 / 64
We have then...
We can do the following
v 1
v 2 =
a11 a12
a21 a22
v1
v2
Then, we compress our notation more
v = av
28 / 64
We have then...
We can do the following
v 1
v 2 =
a11 a12
a21 a22
v1
v2
Then, we compress our notation more
v = av
28 / 64
Then, we can redefine our dot product
The Basis of Projecting into other vectors
v · w = vi
wi
= v
i
w
i
= aij
aik
vj
wk
Using the Kronecker Delta
δij
=
0 if i = j
1 if i = j
Therefore, we have
aij
aik
= δjk
29 / 64
Then, we can redefine our dot product
The Basis of Projecting into other vectors
v · w = vi
wi
= v
i
w
i
= aij
aik
vj
wk
Using the Kronecker Delta
δij
=
0 if i = j
1 if i = j
Therefore, we have
aij
aik
= δjk
29 / 64
Then, we can redefine our dot product
The Basis of Projecting into other vectors
v · w = vi
wi
= v
i
w
i
= aij
aik
vj
wk
Using the Kronecker Delta
δij
=
0 if i = j
1 if i = j
Therefore, we have
aij
aik
= δjk
29 / 64
Proving the Invariance of the dot product
Therefore
v
i
· w
i
= δjk
vj
wk
= vj
· wj
30 / 64
Then, we have
A scalar is a number K
It has the same value in different coordinate systems.
A vector is a set of numbers vi
They Transform according to
v i
= aij
vj
A (Second Rank) Tensor is a set of numbers Tij
They transform according to
T ij
= aik
ajl
Tkl
31 / 64
Then, we have
A scalar is a number K
It has the same value in different coordinate systems.
A vector is a set of numbers vi
They Transform according to
v i
= aij
vj
A (Second Rank) Tensor is a set of numbers Tij
They transform according to
T ij
= aik
ajl
Tkl
31 / 64
Then, we have
A scalar is a number K
It has the same value in different coordinate systems.
A vector is a set of numbers vi
They Transform according to
v i
= aij
vj
A (Second Rank) Tensor is a set of numbers Tij
They transform according to
T ij
= aik
ajl
Tkl
31 / 64
Then you can go higher
For Example, tensors in Rank 3
32 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
33 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
Once, we have an idea of Tensor
Do we have similar decompositions that the ones in SVD?
We have them......!!!
A Little Bit of History
Tensor decompositions originated with Hitchcock in 1927
An American mathematician and physicist known for his formulation of
the transportation problem in 1941.
A multiway model is attributed to Cattell in 1944
A British and American psychologist, known for his psychometric
research into intrapersonal psychological structure.
But it is until Ledyard R. Tucker
“Some mathematical notes on three-mode factor analysis,”
Psychometrika, 31 (1966), pp. 279–311.
34 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
The Dream has been expanding beyond Physics
In the last ten years
1 Signal Processing
2 Numerical Linear Algebra
3 Computer Vision
4 Data Mining
5 Graph analysis
6 Neurosciences
7 etc
And we are going further
The Dream of Representation is at full speed when dealing with BIG
DATA!!!
35 / 64
Decomposition of Tensors
Hitchcock Proposed such decomposition first... then the deluge
Name Proposed by
Polyadic form of a tensor Hitchcock, 1927
Three-mode Tucker 1966
factor analysis
PARAFAC (parallel factors) Harshman, 1970
CANDECOMP or CAND Carroll and Chang, 1970
(canonical decomposition)
Topographic components Möcks, 1988
model
CP (CANDECOMP/PARAFAC) Kiers, 2000
36 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
37 / 64
Look at the most modern on, 17 years ago...
The CP decomposition factorizes a tensor into a sum of component
rank-one tensors (Vectors!!!)
X ≈
R
r=1
ar ◦ br ◦ cr with X ∈ RI×J×K
Where
R is a positive integer
ar ∈ RI
br
∈ RJ
cr ∈ RK
38 / 64
Look at the most modern on, 17 years ago...
The CP decomposition factorizes a tensor into a sum of component
rank-one tensors (Vectors!!!)
X ≈
R
r=1
ar ◦ br ◦ cr with X ∈ RI×J×K
Where
R is a positive integer
ar ∈ RI
br
∈ RJ
cr ∈ RK
38 / 64
Then, Point Wise
We have the following
xijk =
R
r=1
airbjrccr
Graphically
39 / 64
Then, Point Wise
We have the following
xijk =
R
r=1
airbjrccr
Graphically
39 / 64
Therefore
The rank of a tensor X, rank(X)
It is defined as the smallest number of rank-one tensors that generate X
as their sum!!!
Problem!!!
The problem is NP-hard
But that has not stopped us because
We can use many of the methods in optimization to try to figure out
the magical number R!!!
From Approximation Techniques...
To Branch and Bound...
Even Naive techniques...
40 / 64
Therefore
The rank of a tensor X, rank(X)
It is defined as the smallest number of rank-one tensors that generate X
as their sum!!!
Problem!!!
The problem is NP-hard
But that has not stopped us because
We can use many of the methods in optimization to try to figure out
the magical number R!!!
From Approximation Techniques...
To Branch and Bound...
Even Naive techniques...
40 / 64
Therefore
The rank of a tensor X, rank(X)
It is defined as the smallest number of rank-one tensors that generate X
as their sum!!!
Problem!!!
The problem is NP-hard
But that has not stopped us because
We can use many of the methods in optimization to try to figure out
the magical number R!!!
From Approximation Techniques...
To Branch and Bound...
Even Naive techniques...
40 / 64
Why so much effort?
A Big Difference with SVD
It is never unique unless we have a orthogonality between the columns or
rows in the matrix.
We have then
That Tensors are way more general and less prone to problems!!!
41 / 64
Why so much effort?
A Big Difference with SVD
It is never unique unless we have a orthogonality between the columns or
rows in the matrix.
We have then
That Tensors are way more general and less prone to problems!!!
41 / 64
Now
We introduce a little bit of more notation
X ≈
R
r=1
ar ◦ br ◦ cr = A, B, C
CP Decompose the Tensor using the following Optimization
min
X
X − X
s.t. X =
R
r=1
λar ◦ br ◦ cr = λ; A, B, C
42 / 64
Now
We introduce a little bit of more notation
X ≈
R
r=1
ar ◦ br ◦ cr = A, B, C
CP Decompose the Tensor using the following Optimization
min
X
X − X
s.t. X =
R
r=1
λar ◦ br ◦ cr = λ; A, B, C
42 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
43 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Here is why...
Here a simulation by direct numerical simulation
It can easily produce 100 GB to 1000 GB per DAY
The data came from (CIRCA 2016)
It a is called S3D, a massively parallel compressible reacting flow solver
developed at Sandia National Laboratories...
For example, data came from
1 Autoignitive premixture of air and ethanol in Homogeneous Charge
Compression Ignition (HCCI)
1 Each time step requires 111 MB of storage, and the entire dataset is 70
GB.
2 A temporally-evolving planar slot jet flame with DME (dimethyl
ether) as the fuel
1 Each time step requires 32 GB storage, so the entire dataset is 520 GB
44 / 64
Even in Machines like
a Cray XC30 super- computer
5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute
nodes.
The peak flop rate of each core is 19.2 GFLOPS.
Each node has 64 GB of memory.
This machines will go down
Because the data representation is not efficient...
45 / 64
Even in Machines like
a Cray XC30 super- computer
5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute
nodes.
The peak flop rate of each core is 19.2 GFLOPS.
Each node has 64 GB of memory.
This machines will go down
Because the data representation is not efficient...
45 / 64
Using the Tucker Decomposition
46 / 64
Furthermore...
We have that for 550 Gigabytes compression’s as
1 5 Times 100 Gigs
2 16 Times 34 Gigs
3 55 Times 10 Gig
4 etc
Improving Running times like crazy... from 3 seconds to 70 seconds
when processing 15 TB of data...
47 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
48 / 64
We have a huge problem in Deep Neural Networks
Modern Architectures
They are consuming from 89% to 100% of the memory at host GPU and
Machines
Depending on the place the calculations are done!!!
49 / 64
Problem with such Architectures
Recent studies show
The weight matrix of the fully-connected layer is highly redundant.
if you reduce the number of parameters, you could achieve
A similar predictive power
Possible making them less prone to over-fitting or under-fitting
50 / 64
Problem with such Architectures
Recent studies show
The weight matrix of the fully-connected layer is highly redundant.
if you reduce the number of parameters, you could achieve
A similar predictive power
Possible making them less prone to over-fitting or under-fitting
50 / 64
Thus
In the Paper
Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015.
Tensorizing neural networks. In Advances in Neural Information
Processing Systems (pp. 442-450).
They Proposed the TT-Representation
Where in a d−dimensional array (Tensor) A
If for a each dimension k = 1, ..., d and each possible value of the kth
dimension index jk = 1, ..., nk
There exists a matrix Gk [jk] such that all the elements of A can be
computed as a product of matrices.
51 / 64
Thus
In the Paper
Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015.
Tensorizing neural networks. In Advances in Neural Information
Processing Systems (pp. 442-450).
They Proposed the TT-Representation
Where in a d−dimensional array (Tensor) A
If for a each dimension k = 1, ..., d and each possible value of the kth
dimension index jk = 1, ..., nk
There exists a matrix Gk [jk] such that all the elements of A can be
computed as a product of matrices.
51 / 64
Thus
In the Paper
Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015.
Tensorizing neural networks. In Advances in Neural Information
Processing Systems (pp. 442-450).
They Proposed the TT-Representation
Where in a d−dimensional array (Tensor) A
If for a each dimension k = 1, ..., d and each possible value of the kth
dimension index jk = 1, ..., nk
There exists a matrix Gk [jk] such that all the elements of A can be
computed as a product of matrices.
51 / 64
Then
The TT-Representation
A (j1, j2 · · · , jd) = G1 [j1] G2 [j2] · · · Gd [jd]
All matrices Gk [jk] related to the same dimension k are restricted to
be of the same size rk−1 × rk.
52 / 64
Here a problem, we do not have a unique representation
We then go for the lowest rank
A (j1, j2 · · · , jd) =
α0,...,αd
G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd)
Where
Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position
(α0, α1)
53 / 64
Here a problem, we do not have a unique representation
We then go for the lowest rank
A (j1, j2 · · · , jd) =
α0,...,αd
G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd)
Where
Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position
(α0, α1)
53 / 64
With Memory Usage
For full representation
d
k=1
nk
and the TT-Representation
d
k=1
nkrk−1rk
54 / 64
With Memory Usage
For full representation
d
k=1
nk
and the TT-Representation
d
k=1
nkrk−1rk
54 / 64
Then
They propose to store each layer in a TT-Representation W
Where W are the weight of a fully connected layer
Then, using our old back-propagation
y = Wx + b
With W ∈ RN×M and b ∈ RM
In TT-Representation
Y (i1, i2 · · · , id) =
j1,...,jd
G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id)
55 / 64
Then
They propose to store each layer in a TT-Representation W
Where W are the weight of a fully connected layer
Then, using our old back-propagation
y = Wx + b
With W ∈ RN×M and b ∈ RM
In TT-Representation
Y (i1, i2 · · · , id) =
j1,...,jd
G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id)
55 / 64
Then
They propose to store each layer in a TT-Representation W
Where W are the weight of a fully connected layer
Then, using our old back-propagation
y = Wx + b
With W ∈ RN×M and b ∈ RM
In TT-Representation
Y (i1, i2 · · · , id) =
j1,...,jd
G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id)
55 / 64
This has the following complexity
The previous representation allows to handle a larger number of
parameters
Without too much overhead...
With the following complexities
Operation Time Memory
FC forward pass O(MN) O(MN)
TT forward pass O dr2m max {M, N} O dr2 max {M, N}
FC backward pass O(MN) O(MN)
TT backward pass O dr2m max {M, N} O dr3 max {M, N}
56 / 64
This has the following complexity
The previous representation allows to handle a larger number of
parameters
Without too much overhead...
With the following complexities
Operation Time Memory
FC forward pass O(MN) O(MN)
TT forward pass O dr2m max {M, N} O dr2 max {M, N}
FC backward pass O(MN) O(MN)
TT backward pass O dr2m max {M, N} O dr3 max {M, N}
56 / 64
Applications for this
Manage Better
The amount of memory being used in the devices
Increase the size of the Deep Networks
Although I have some thoughts about this...
Implement CNN Networks into mobile devices
Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu
Yang, and Dongjun Shin. "Compression of deep convolutional neural
networks for fast and low power mobile applications." arXiv preprint
arXiv:1511.06530 (2015).
57 / 64
Applications for this
Manage Better
The amount of memory being used in the devices
Increase the size of the Deep Networks
Although I have some thoughts about this...
Implement CNN Networks into mobile devices
Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu
Yang, and Dongjun Shin. "Compression of deep convolutional neural
networks for fast and low power mobile applications." arXiv preprint
arXiv:1511.06530 (2015).
57 / 64
Applications for this
Manage Better
The amount of memory being used in the devices
Increase the size of the Deep Networks
Although I have some thoughts about this...
Implement CNN Networks into mobile devices
Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu
Yang, and Dongjun Shin. "Compression of deep convolutional neural
networks for fast and low power mobile applications." arXiv preprint
arXiv:1511.06530 (2015).
57 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
58 / 64
Given that
Something Notable
Sparse tensors appear in many large-scale applications with
multidimensional and sparse data.
What support do we have for such situations?
Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri
Dehnavi. "A Unified Optimization Approach for Sparse Tensor
Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017).
59 / 64
Given that
Something Notable
Sparse tensors appear in many large-scale applications with
multidimensional and sparse data.
What support do we have for such situations?
Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri
Dehnavi. "A Unified Optimization Approach for Sparse Tensor
Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017).
59 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
They pointed out different resources that you have around
Shared memory systems
The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely
used MATLAB
The Cyclops Tensor Framework (CTF) is a C++ library which
provides automatic parallelization for sparse tensor operations.
etc
Distributed memory systems
Gigatensor handles tera-scale tensors using the MapReduce
framework.
Hypertensor is a sparse tensor library for SpMTTKRP on
distributed-memory environments.
etc
60 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
And the Grial
GPU
Li proposes a parallel algorithm and implementation of on GPUs via
parallelizing certain algorithms on fibers.
TensorFlow... actually supports certain version of Tensor
representation...
Something Notable
Efforts to solve more problems are on the way
The future looks promising
61 / 64
Outline
1 Introduction
The Dream of Tensors
A Short Story on Compression
A Short History
What a Heck are Tensors?
2 The Tensor Models for Data Science
Decomposition for Compression
CANDECOMP/PARAFAC Decomposition
The Dream of Compression and BIG DATA
Tensorizing Neural Networks
Hardware Support for the Dream
3 Conclusions
The Dream Will Follow....
62 / 64
As Always
We need people able to dream these new ways of doing stuff...
Therefore, a series of pieces of advise...
Learn more than a simple framework...
Learn the mathematics
And more importantly
Learn how to Model the Reality using such
Mathematical Tools...
63 / 64
As Always
We need people able to dream these new ways of doing stuff...
Therefore, a series of pieces of advise...
Learn more than a simple framework...
Learn the mathematics
And more importantly
Learn how to Model the Reality using such
Mathematical Tools...
63 / 64
As Always
We need people able to dream these new ways of doing stuff...
Therefore, a series of pieces of advise...
Learn more than a simple framework...
Learn the mathematics
And more importantly
Learn how to Model the Reality using such
Mathematical Tools...
63 / 64
Thanks
Any Questions?
I repeat I am not an expert in Tensor Calculus....
64 / 64

More Related Content

Similar to Tensor models and other dreams by PhD Andres Mendez-Vazquez

Parity arguments in problem solving
Parity arguments in problem solvingParity arguments in problem solving
Parity arguments in problem solvingtalegari
 
Graph theory
Graph theoryGraph theory
Graph theoryKumar
 
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONSTRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONSBRNSS Publication Hub
 
A guide for teachers – Years 11 and 121 23
A guide for teachers – Years 11 and 121  23 A guide for teachers – Years 11 and 121  23
A guide for teachers – Years 11 and 121 23 mecklenburgstrelitzh
 
A guide for teachers – Years 11 and 121 23 .docx
A guide for teachers – Years 11 and 121  23 .docxA guide for teachers – Years 11 and 121  23 .docx
A guide for teachers – Years 11 and 121 23 .docxmakdul
 
1.3 Pythagorean Theorem
1.3 Pythagorean Theorem1.3 Pythagorean Theorem
1.3 Pythagorean Theoremsmiller5
 
6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii x6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii xmath266
 
Hidden dimensions in nature
Hidden dimensions in natureHidden dimensions in nature
Hidden dimensions in natureMilan Joshi
 
hidden dimension in nature
hidden dimension in naturehidden dimension in nature
hidden dimension in natureMilan Joshi
 
HIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATUREHIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATUREMilan Joshi
 
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHIFRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHIMILANJOSHIJI
 
An FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge ColoringAn FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge ColoringNeeldhara Misra
 
Dependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural LanguageDependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural LanguageDaisuke BEKKI
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation AxiomsKarel Ha
 

Similar to Tensor models and other dreams by PhD Andres Mendez-Vazquez (20)

Fractals
FractalsFractals
Fractals
 
Parity arguments in problem solving
Parity arguments in problem solvingParity arguments in problem solving
Parity arguments in problem solving
 
Graph theory
Graph theoryGraph theory
Graph theory
 
CRMS Calculus May 31, 2010
CRMS Calculus May 31, 2010CRMS Calculus May 31, 2010
CRMS Calculus May 31, 2010
 
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONSTRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL  CANTOR FUNCTIONS
TRANSCENDENTAL CANTOR SETS AND TRANSCENDENTAL CANTOR FUNCTIONS
 
A guide for teachers – Years 11 and 121 23
A guide for teachers – Years 11 and 121  23 A guide for teachers – Years 11 and 121  23
A guide for teachers – Years 11 and 121 23
 
A guide for teachers – Years 11 and 121 23 .docx
A guide for teachers – Years 11 and 121  23 .docxA guide for teachers – Years 11 and 121  23 .docx
A guide for teachers – Years 11 and 121 23 .docx
 
1.3 Pythagorean Theorem
1.3 Pythagorean Theorem1.3 Pythagorean Theorem
1.3 Pythagorean Theorem
 
Gd 26
Gd 26Gd 26
Gd 26
 
6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii x6 volumes of solids of revolution ii x
6 volumes of solids of revolution ii x
 
Archimedes
ArchimedesArchimedes
Archimedes
 
Large Deviations: An Introduction
Large Deviations: An IntroductionLarge Deviations: An Introduction
Large Deviations: An Introduction
 
Hidden dimensions in nature
Hidden dimensions in natureHidden dimensions in nature
Hidden dimensions in nature
 
hidden dimension in nature
hidden dimension in naturehidden dimension in nature
hidden dimension in nature
 
HIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATUREHIDDEN DIMENSIONS IN NATURE
HIDDEN DIMENSIONS IN NATURE
 
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHIFRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
FRACTAL GEOMETRY AND ITS APPLICATIONS BY MILAN A JOSHI
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
An FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge ColoringAn FPT Algorithm for Maximum Edge Coloring
An FPT Algorithm for Maximum Edge Coloring
 
Dependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural LanguageDependent Types and Dynamics of Natural Language
Dependent Types and Dynamics of Natural Language
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation Axioms
 

More from DataLab Community

Meetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos GenéticosMeetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos GenéticosDataLab Community
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018DataLab Community
 
Meetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark FundamentalsMeetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark FundamentalsDataLab Community
 
Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...DataLab Community
 
Metodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña LlamasMetodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña LlamasDataLab Community
 
Curse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando GarciaCurse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando GarciaDataLab Community
 
Quiénes somos - DataLab Community
Quiénes somos - DataLab CommunityQuiénes somos - DataLab Community
Quiénes somos - DataLab CommunityDataLab Community
 
Profesiones de la ciencia de datos
Profesiones de la ciencia de datosProfesiones de la ciencia de datos
Profesiones de la ciencia de datosDataLab Community
 
El arte de la Ciencia de Datos
El arte de la Ciencia de DatosEl arte de la Ciencia de Datos
El arte de la Ciencia de DatosDataLab Community
 
Presentación de DataLab Community
Presentación de DataLab CommunityPresentación de DataLab Community
Presentación de DataLab CommunityDataLab Community
 
De qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data ScienceDe qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data ScienceDataLab Community
 

More from DataLab Community (11)

Meetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos GenéticosMeetup Julio Algoritmos Genéticos
Meetup Julio Algoritmos Genéticos
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
Meetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark FundamentalsMeetup Junio Apache Spark Fundamentals
Meetup Junio Apache Spark Fundamentals
 
Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...Procesar e interpretar señales biológicas para hacer predicción de movimiento...
Procesar e interpretar señales biológicas para hacer predicción de movimiento...
 
Metodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña LlamasMetodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
Metodos de kernel en machine learning by MC Luis Ricardo Peña Llamas
 
Curse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando GarciaCurse of dimensionality by MC Ivan Alejando Garcia
Curse of dimensionality by MC Ivan Alejando Garcia
 
Quiénes somos - DataLab Community
Quiénes somos - DataLab CommunityQuiénes somos - DataLab Community
Quiénes somos - DataLab Community
 
Profesiones de la ciencia de datos
Profesiones de la ciencia de datosProfesiones de la ciencia de datos
Profesiones de la ciencia de datos
 
El arte de la Ciencia de Datos
El arte de la Ciencia de DatosEl arte de la Ciencia de Datos
El arte de la Ciencia de Datos
 
Presentación de DataLab Community
Presentación de DataLab CommunityPresentación de DataLab Community
Presentación de DataLab Community
 
De qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data ScienceDe qué hablamos cuando hablamos de Data Science
De qué hablamos cuando hablamos de Data Science
 

Recently uploaded

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 

Recently uploaded (20)

The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 

Tensor models and other dreams by PhD Andres Mendez-Vazquez

  • 1. Tensor Models and Other Dreams... Andres Mendez-Vazquez January 26, 2018 1 / 64
  • 2. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 2 / 64
  • 3. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 3 / 64
  • 4. Tensors are this way... As words defining an important moment in life Without you All the stars we steal from the night sky Will never be enough Never be enough These hands could hold the world but it’ll Never be enough... - Justin Paul / Benj Pasek, Greatest Showman 4 / 64
  • 5. Tensors are like such words... They represent generalizations that represent our dreams... In Data Sciences... 5 / 64
  • 6. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 6 / 64
  • 7. Document Representation Imagine the following... You have a bunch of documents... They are hundred thousands of them... 7 / 64
  • 8. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 9. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 10. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 11. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 12. Then, we have an Opportunity or a Terrible Problem How do you represent them in a easy way to handle them? After all we want to Search them Compare them Rank them What about using vectors? word 1 word 2 word 3 word 4 · · · word d counter counter counter counter counter x1 x2 x3 x4 · · · xd 8 / 64
  • 13. The Matrix at the Center of Everything!!! The Vector/Matrix Representation They are basically a N × d matrix like this A =          (x1)1 · · · (x1)j · · · (x1)d ... ... (xi)1 (xi)j (xi)d ... ... (xN )1 · · · (xN )j · · · (xN )d          A is a matrix with... N represents the thousands of documents... d represents the thousands of words in a dictionary..... 9 / 64
  • 14. The Matrix at the Center of Everything!!! The Vector/Matrix Representation They are basically a N × d matrix like this A =          (x1)1 · · · (x1)j · · · (x1)d ... ... (xi)1 (xi)j (xi)d ... ... (xN )1 · · · (xN )j · · · (xN )d          A is a matrix with... N represents the thousands of documents... d represents the thousands of words in a dictionary..... 9 / 64
  • 15. A Small Problem The matrix alone consumes... so much... You have 2 bytes per memory cell If we have N = 106 , d = 50, 000 We have 2 × N × d = 100 Gigabytes 10 / 64
  • 16. A Small Problem The matrix alone consumes... so much... You have 2 bytes per memory cell If we have N = 106 , d = 50, 000 We have 2 × N × d = 100 Gigabytes 10 / 64
  • 17. Danger!!! Will Robinson Lost in Space 11 / 64
  • 18. We have a trick!!! Something Notable The Matrix is Highly SPARSE 12 / 64
  • 19. Therefore If you are smart enough You start represent the matrix information using sparse techniques 5x5 Matrix Numeric Elements Empty Elements Sparse Matrix 13 / 64
  • 20. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 21. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 22. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 23. Then If you are quite smart.... You discover that few of the eigenvalues provide some information... Every Matrix has a Singular Value Decomposition A = UΣV T The columns of U are an orthonormal basis for the column space. The columns of V are an orthonormal basis for the row space. The Σ is diagonal and the entries on its diagonal σi = Σii are positive real numbers, called the singular values of A. 14 / 64
  • 24. How much compression can we get? The Matrix Sparse Representation It Achieves 90% Compression - We go from 100 Gigabytes to 10 Gigabytes From 50,000 dimensions/words we go to 300 dimensions Using the Singular Value Decomposition Making Possible to go from 100 Gigabytes to 2 × N × 300 = 0.6 Gigabytes 15 / 64
  • 25. How much compression can we get? The Matrix Sparse Representation It Achieves 90% Compression - We go from 100 Gigabytes to 10 Gigabytes From 50,000 dimensions/words we go to 300 dimensions Using the Singular Value Decomposition Making Possible to go from 100 Gigabytes to 2 × N × 300 = 0.6 Gigabytes 15 / 64
  • 26. How much compression can we get? The Matrix Sparse Representation It Achieves 90% Compression - We go from 100 Gigabytes to 10 Gigabytes From 50,000 dimensions/words we go to 300 dimensions Using the Singular Value Decomposition Making Possible to go from 100 Gigabytes to 2 × N × 300 = 0.6 Gigabytes 15 / 64
  • 27. IMAGINE!!!! We have a crazy moment!!! All the stars we steal from the night sky Will never be enough Never be enough Towers of gold are still too little These hands could hold the world but it’ll Never be enough Never be enough For me 16 / 64
  • 28. Then You go ambitious!!! You add a new dimension representing feelings!!! Feeling Dim ensionality 17 / 64
  • 29. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 18 / 64
  • 30. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 31. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 32. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 33. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 34. They have a somewhat short history!!! First Most They are abstract entities invariant under coordinate transformations. They were mentioned first by Woldemar Wright in 1898 A German physicist, who taught at the Georg August University of Göttingen. He mentioned the tensors in a study about the physical properties of crystals. But Before That The Great Riemann introduced the concept of topological manifold... the beginning of the dream... Through a quadratic linear element to study its properties... ds2 = gijdxi dxj 19 / 64
  • 35. Then Gregorio Ricci-Curbastro and Tullio Levi-Civita They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) , entitled "Méthodes de calcul differéntiel absolu" A Monster Came Around 20 / 64
  • 36. Then Gregorio Ricci-Curbastro and Tullio Levi-Civita They wrote a paper in the Mathematische Annalen , Vol. 54 (1901) , entitled "Méthodes de calcul differéntiel absolu" A Monster Came Around 20 / 64
  • 37. “Every Genius has stood in the Shoulder of Giants” - Newton Einstein adopted the concepts at the paper And the Theory of General Relativity was born He renamed the entire field from “calcul absolu” TENSOR CALCULUS 21 / 64
  • 38. “Every Genius has stood in the Shoulder of Giants” - Newton Einstein adopted the concepts at the paper And the Theory of General Relativity was born He renamed the entire field from “calcul absolu” TENSOR CALCULUS 21 / 64
  • 39. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 22 / 64
  • 40. First Principles... Imagine a linear coordinate system 23 / 64
  • 41. We define A Coordinate System We define vectors in terms of a base v = vxe1 + vye2 = vx vy ∈ R2 v = v 2 = v2 x + v2 y 1 2 Note: This is important vectors are always the same thing no matter the coordinate thing 24 / 64
  • 42. Therefore Imagine to represent the new basis in terms of an old basis e1 · v = vx = e1 · vxe1 + e1 · vye2 e2 · v = vy = e2 · vxe1 + e2 · vye2 Where ei · ej = Projection of ei onto ej 25 / 64
  • 43. Therefore Imagine to represent the new basis in terms of an old basis e1 · v = vx = e1 · vxe1 + e1 · vye2 e2 · v = vy = e2 · vxe1 + e2 · vye2 Where ei · ej = Projection of ei onto ej 25 / 64
  • 44. Using a Little bit of Notation We need a notation that is both more compact Let the indices i, j represent the numbers 1, 2 corresponding to the coordinates x, y Write components of v as vi and v i in the two coordinate system Then define aij = ei · ej Note: This define the “ROTATION” In fact are individually just the cosines of the angle between one axis and another 26 / 64
  • 45. Using a Little bit of Notation We need a notation that is both more compact Let the indices i, j represent the numbers 1, 2 corresponding to the coordinates x, y Write components of v as vi and v i in the two coordinate system Then define aij = ei · ej Note: This define the “ROTATION” In fact are individually just the cosines of the angle between one axis and another 26 / 64
  • 46. Therefore We can rewrite the entire transformation v i = 2 j=1 aij vj We will agree that whenever an index appears twice, we have a sum v i = aij vj 27 / 64
  • 47. Therefore We can rewrite the entire transformation v i = 2 j=1 aij vj We will agree that whenever an index appears twice, we have a sum v i = aij vj 27 / 64
  • 48. We have then... We can do the following v 1 v 2 = a11 a12 a21 a22 v1 v2 Then, we compress our notation more v = av 28 / 64
  • 49. We have then... We can do the following v 1 v 2 = a11 a12 a21 a22 v1 v2 Then, we compress our notation more v = av 28 / 64
  • 50. Then, we can redefine our dot product The Basis of Projecting into other vectors v · w = vi wi = v i w i = aij aik vj wk Using the Kronecker Delta δij = 0 if i = j 1 if i = j Therefore, we have aij aik = δjk 29 / 64
  • 51. Then, we can redefine our dot product The Basis of Projecting into other vectors v · w = vi wi = v i w i = aij aik vj wk Using the Kronecker Delta δij = 0 if i = j 1 if i = j Therefore, we have aij aik = δjk 29 / 64
  • 52. Then, we can redefine our dot product The Basis of Projecting into other vectors v · w = vi wi = v i w i = aij aik vj wk Using the Kronecker Delta δij = 0 if i = j 1 if i = j Therefore, we have aij aik = δjk 29 / 64
  • 53. Proving the Invariance of the dot product Therefore v i · w i = δjk vj wk = vj · wj 30 / 64
  • 54. Then, we have A scalar is a number K It has the same value in different coordinate systems. A vector is a set of numbers vi They Transform according to v i = aij vj A (Second Rank) Tensor is a set of numbers Tij They transform according to T ij = aik ajl Tkl 31 / 64
  • 55. Then, we have A scalar is a number K It has the same value in different coordinate systems. A vector is a set of numbers vi They Transform according to v i = aij vj A (Second Rank) Tensor is a set of numbers Tij They transform according to T ij = aik ajl Tkl 31 / 64
  • 56. Then, we have A scalar is a number K It has the same value in different coordinate systems. A vector is a set of numbers vi They Transform according to v i = aij vj A (Second Rank) Tensor is a set of numbers Tij They transform according to T ij = aik ajl Tkl 31 / 64
  • 57. Then you can go higher For Example, tensors in Rank 3 32 / 64
  • 58. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 33 / 64
  • 59. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 60. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 61. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 62. Once, we have an idea of Tensor Do we have similar decompositions that the ones in SVD? We have them......!!! A Little Bit of History Tensor decompositions originated with Hitchcock in 1927 An American mathematician and physicist known for his formulation of the transportation problem in 1941. A multiway model is attributed to Cattell in 1944 A British and American psychologist, known for his psychometric research into intrapersonal psychological structure. But it is until Ledyard R. Tucker “Some mathematical notes on three-mode factor analysis,” Psychometrika, 31 (1966), pp. 279–311. 34 / 64
  • 63. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 64. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 65. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 66. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 67. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 68. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 69. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 70. The Dream has been expanding beyond Physics In the last ten years 1 Signal Processing 2 Numerical Linear Algebra 3 Computer Vision 4 Data Mining 5 Graph analysis 6 Neurosciences 7 etc And we are going further The Dream of Representation is at full speed when dealing with BIG DATA!!! 35 / 64
  • 71. Decomposition of Tensors Hitchcock Proposed such decomposition first... then the deluge Name Proposed by Polyadic form of a tensor Hitchcock, 1927 Three-mode Tucker 1966 factor analysis PARAFAC (parallel factors) Harshman, 1970 CANDECOMP or CAND Carroll and Chang, 1970 (canonical decomposition) Topographic components Möcks, 1988 model CP (CANDECOMP/PARAFAC) Kiers, 2000 36 / 64
  • 72. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 37 / 64
  • 73. Look at the most modern on, 17 years ago... The CP decomposition factorizes a tensor into a sum of component rank-one tensors (Vectors!!!) X ≈ R r=1 ar ◦ br ◦ cr with X ∈ RI×J×K Where R is a positive integer ar ∈ RI br ∈ RJ cr ∈ RK 38 / 64
  • 74. Look at the most modern on, 17 years ago... The CP decomposition factorizes a tensor into a sum of component rank-one tensors (Vectors!!!) X ≈ R r=1 ar ◦ br ◦ cr with X ∈ RI×J×K Where R is a positive integer ar ∈ RI br ∈ RJ cr ∈ RK 38 / 64
  • 75. Then, Point Wise We have the following xijk = R r=1 airbjrccr Graphically 39 / 64
  • 76. Then, Point Wise We have the following xijk = R r=1 airbjrccr Graphically 39 / 64
  • 77. Therefore The rank of a tensor X, rank(X) It is defined as the smallest number of rank-one tensors that generate X as their sum!!! Problem!!! The problem is NP-hard But that has not stopped us because We can use many of the methods in optimization to try to figure out the magical number R!!! From Approximation Techniques... To Branch and Bound... Even Naive techniques... 40 / 64
  • 78. Therefore The rank of a tensor X, rank(X) It is defined as the smallest number of rank-one tensors that generate X as their sum!!! Problem!!! The problem is NP-hard But that has not stopped us because We can use many of the methods in optimization to try to figure out the magical number R!!! From Approximation Techniques... To Branch and Bound... Even Naive techniques... 40 / 64
  • 79. Therefore The rank of a tensor X, rank(X) It is defined as the smallest number of rank-one tensors that generate X as their sum!!! Problem!!! The problem is NP-hard But that has not stopped us because We can use many of the methods in optimization to try to figure out the magical number R!!! From Approximation Techniques... To Branch and Bound... Even Naive techniques... 40 / 64
  • 80. Why so much effort? A Big Difference with SVD It is never unique unless we have a orthogonality between the columns or rows in the matrix. We have then That Tensors are way more general and less prone to problems!!! 41 / 64
  • 81. Why so much effort? A Big Difference with SVD It is never unique unless we have a orthogonality between the columns or rows in the matrix. We have then That Tensors are way more general and less prone to problems!!! 41 / 64
  • 82. Now We introduce a little bit of more notation X ≈ R r=1 ar ◦ br ◦ cr = A, B, C CP Decompose the Tensor using the following Optimization min X X − X s.t. X = R r=1 λar ◦ br ◦ cr = λ; A, B, C 42 / 64
  • 83. Now We introduce a little bit of more notation X ≈ R r=1 ar ◦ br ◦ cr = A, B, C CP Decompose the Tensor using the following Optimization min X X − X s.t. X = R r=1 λar ◦ br ◦ cr = λ; A, B, C 42 / 64
  • 84. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 43 / 64
  • 85. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 86. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 87. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 88. Here is why... Here a simulation by direct numerical simulation It can easily produce 100 GB to 1000 GB per DAY The data came from (CIRCA 2016) It a is called S3D, a massively parallel compressible reacting flow solver developed at Sandia National Laboratories... For example, data came from 1 Autoignitive premixture of air and ethanol in Homogeneous Charge Compression Ignition (HCCI) 1 Each time step requires 111 MB of storage, and the entire dataset is 70 GB. 2 A temporally-evolving planar slot jet flame with DME (dimethyl ether) as the fuel 1 Each time step requires 32 GB storage, so the entire dataset is 520 GB 44 / 64
  • 89. Even in Machines like a Cray XC30 super- computer 5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute nodes. The peak flop rate of each core is 19.2 GFLOPS. Each node has 64 GB of memory. This machines will go down Because the data representation is not efficient... 45 / 64
  • 90. Even in Machines like a Cray XC30 super- computer 5,576 dual-socket 12-core Intel “Ivy Bridge” (2.4 GHz) compute nodes. The peak flop rate of each core is 19.2 GFLOPS. Each node has 64 GB of memory. This machines will go down Because the data representation is not efficient... 45 / 64
  • 91. Using the Tucker Decomposition 46 / 64
  • 92. Furthermore... We have that for 550 Gigabytes compression’s as 1 5 Times 100 Gigs 2 16 Times 34 Gigs 3 55 Times 10 Gig 4 etc Improving Running times like crazy... from 3 seconds to 70 seconds when processing 15 TB of data... 47 / 64
  • 93. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 48 / 64
  • 94. We have a huge problem in Deep Neural Networks Modern Architectures They are consuming from 89% to 100% of the memory at host GPU and Machines Depending on the place the calculations are done!!! 49 / 64
  • 95. Problem with such Architectures Recent studies show The weight matrix of the fully-connected layer is highly redundant. if you reduce the number of parameters, you could achieve A similar predictive power Possible making them less prone to over-fitting or under-fitting 50 / 64
  • 96. Problem with such Architectures Recent studies show The weight matrix of the fully-connected layer is highly redundant. if you reduce the number of parameters, you could achieve A similar predictive power Possible making them less prone to over-fitting or under-fitting 50 / 64
  • 97. Thus In the Paper Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (pp. 442-450). They Proposed the TT-Representation Where in a d−dimensional array (Tensor) A If for a each dimension k = 1, ..., d and each possible value of the kth dimension index jk = 1, ..., nk There exists a matrix Gk [jk] such that all the elements of A can be computed as a product of matrices. 51 / 64
  • 98. Thus In the Paper Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (pp. 442-450). They Proposed the TT-Representation Where in a d−dimensional array (Tensor) A If for a each dimension k = 1, ..., d and each possible value of the kth dimension index jk = 1, ..., nk There exists a matrix Gk [jk] such that all the elements of A can be computed as a product of matrices. 51 / 64
  • 99. Thus In the Paper Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P., 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems (pp. 442-450). They Proposed the TT-Representation Where in a d−dimensional array (Tensor) A If for a each dimension k = 1, ..., d and each possible value of the kth dimension index jk = 1, ..., nk There exists a matrix Gk [jk] such that all the elements of A can be computed as a product of matrices. 51 / 64
  • 100. Then The TT-Representation A (j1, j2 · · · , jd) = G1 [j1] G2 [j2] · · · Gd [jd] All matrices Gk [jk] related to the same dimension k are restricted to be of the same size rk−1 × rk. 52 / 64
  • 101. Here a problem, we do not have a unique representation We then go for the lowest rank A (j1, j2 · · · , jd) = α0,...,αd G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd) Where Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position (α0, α1) 53 / 64
  • 102. Here a problem, we do not have a unique representation We then go for the lowest rank A (j1, j2 · · · , jd) = α0,...,αd G1 [j1] (α0, α1) · · · Gd [jd] (αd−1, αd) Where Gk [jk] (αk−1, αk) represent the element of the matrix Gk [jk] at position (α0, α1) 53 / 64
  • 103. With Memory Usage For full representation d k=1 nk and the TT-Representation d k=1 nkrk−1rk 54 / 64
  • 104. With Memory Usage For full representation d k=1 nk and the TT-Representation d k=1 nkrk−1rk 54 / 64
  • 105. Then They propose to store each layer in a TT-Representation W Where W are the weight of a fully connected layer Then, using our old back-propagation y = Wx + b With W ∈ RN×M and b ∈ RM In TT-Representation Y (i1, i2 · · · , id) = j1,...,jd G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id) 55 / 64
  • 106. Then They propose to store each layer in a TT-Representation W Where W are the weight of a fully connected layer Then, using our old back-propagation y = Wx + b With W ∈ RN×M and b ∈ RM In TT-Representation Y (i1, i2 · · · , id) = j1,...,jd G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id) 55 / 64
  • 107. Then They propose to store each layer in a TT-Representation W Where W are the weight of a fully connected layer Then, using our old back-propagation y = Wx + b With W ∈ RN×M and b ∈ RM In TT-Representation Y (i1, i2 · · · , id) = j1,...,jd G1 [i1, j1] ...Gd [id, jd] X (j1, j2 · · · , jd) + B (i1, i2 · · · , id) 55 / 64
  • 108. This has the following complexity The previous representation allows to handle a larger number of parameters Without too much overhead... With the following complexities Operation Time Memory FC forward pass O(MN) O(MN) TT forward pass O dr2m max {M, N} O dr2 max {M, N} FC backward pass O(MN) O(MN) TT backward pass O dr2m max {M, N} O dr3 max {M, N} 56 / 64
  • 109. This has the following complexity The previous representation allows to handle a larger number of parameters Without too much overhead... With the following complexities Operation Time Memory FC forward pass O(MN) O(MN) TT forward pass O dr2m max {M, N} O dr2 max {M, N} FC backward pass O(MN) O(MN) TT backward pass O dr2m max {M, N} O dr3 max {M, N} 56 / 64
  • 110. Applications for this Manage Better The amount of memory being used in the devices Increase the size of the Deep Networks Although I have some thoughts about this... Implement CNN Networks into mobile devices Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. "Compression of deep convolutional neural networks for fast and low power mobile applications." arXiv preprint arXiv:1511.06530 (2015). 57 / 64
  • 111. Applications for this Manage Better The amount of memory being used in the devices Increase the size of the Deep Networks Although I have some thoughts about this... Implement CNN Networks into mobile devices Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. "Compression of deep convolutional neural networks for fast and low power mobile applications." arXiv preprint arXiv:1511.06530 (2015). 57 / 64
  • 112. Applications for this Manage Better The amount of memory being used in the devices Increase the size of the Deep Networks Although I have some thoughts about this... Implement CNN Networks into mobile devices Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. "Compression of deep convolutional neural networks for fast and low power mobile applications." arXiv preprint arXiv:1511.06530 (2015). 57 / 64
  • 113. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 58 / 64
  • 114. Given that Something Notable Sparse tensors appear in many large-scale applications with multidimensional and sparse data. What support do we have for such situations? Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri Dehnavi. "A Unified Optimization Approach for Sparse Tensor Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017). 59 / 64
  • 115. Given that Something Notable Sparse tensors appear in many large-scale applications with multidimensional and sparse data. What support do we have for such situations? Liu, Bangtian, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri Dehnavi. "A Unified Optimization Approach for Sparse Tensor Operations on GPUs." arXiv preprint arXiv:1705.09905 (2017). 59 / 64
  • 116. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 117. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 118. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 119. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 120. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 121. They pointed out different resources that you have around Shared memory systems The Tensor Toolbox [21], [4] and N-way Toolbox [22] are two widely used MATLAB The Cyclops Tensor Framework (CTF) is a C++ library which provides automatic parallelization for sparse tensor operations. etc Distributed memory systems Gigatensor handles tera-scale tensors using the MapReduce framework. Hypertensor is a sparse tensor library for SpMTTKRP on distributed-memory environments. etc 60 / 64
  • 122. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 123. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 124. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 125. And the Grial GPU Li proposes a parallel algorithm and implementation of on GPUs via parallelizing certain algorithms on fibers. TensorFlow... actually supports certain version of Tensor representation... Something Notable Efforts to solve more problems are on the way The future looks promising 61 / 64
  • 126. Outline 1 Introduction The Dream of Tensors A Short Story on Compression A Short History What a Heck are Tensors? 2 The Tensor Models for Data Science Decomposition for Compression CANDECOMP/PARAFAC Decomposition The Dream of Compression and BIG DATA Tensorizing Neural Networks Hardware Support for the Dream 3 Conclusions The Dream Will Follow.... 62 / 64
  • 127. As Always We need people able to dream these new ways of doing stuff... Therefore, a series of pieces of advise... Learn more than a simple framework... Learn the mathematics And more importantly Learn how to Model the Reality using such Mathematical Tools... 63 / 64
  • 128. As Always We need people able to dream these new ways of doing stuff... Therefore, a series of pieces of advise... Learn more than a simple framework... Learn the mathematics And more importantly Learn how to Model the Reality using such Mathematical Tools... 63 / 64
  • 129. As Always We need people able to dream these new ways of doing stuff... Therefore, a series of pieces of advise... Learn more than a simple framework... Learn the mathematics And more importantly Learn how to Model the Reality using such Mathematical Tools... 63 / 64
  • 130. Thanks Any Questions? I repeat I am not an expert in Tensor Calculus.... 64 / 64