SlideShare a Scribd company logo
1 of 23
Hierarchical representation with
hyperbolic geometry
2016-20873 Segwang Kim 1
โ‘  Embedding Symbolic and Hierarchical Data
โ‘ก Introduction to Hyperbolic Space
โ‘ข Optimization over Hyperbolic Space
โ‘ฃ Toy Experiments
Overview
2
3
Embedding Symbolic and Hierarchical Data
Symbolic and Hierarchical Data
4
Symbolic data with Implicit hierarchy.
Downstream tasks
link prediction, node classification, community detection, visualization
Wordnet Twitter Social Graph
?LINK
community
Good Hierarchical Embedding
5
For downstream tasks, symbolic and hierarchical data needs to
be embedded into space.
Good Embedding?
Embeddings of similar symbols should aggregate in some sense.
Symbolic arithmetic exists: v(King)- v(man) + v(woman)=v(Queen)
Hierarchy can be restored from embedded data.
The space should have low dimension.
6
Introduction to Hyperbolic Space
Limitation of Euclidean Embedding
7
Embed graph structure while preserving distances
Thm) Trees cannot be embedded into Euclidean space with
arbitrarily low distortion for any number of dimensions
a
b Graph Euclidean ??
D(a,b) 2 0.1 1.889
D(a,c) 2 1 1.902
D(a,d) 2 1.8 1.962
Euclidean
Graph
??
c
d
a
b
c
d
a
b
c
d
Embedding
Representation tradeoffs for hyperbolic Embeddings (ICML 2018)
Euclidean Space vs Hyperbolic space
8
๐‘€ = ๐ท ๐‘› = {๐‘ฅ โˆˆ โ„ ๐‘› โˆถ ๐‘ฅ1
2
+ โ‹ฏ + ๐‘ฅ ๐‘›
2 < 1}
(๐ท ๐‘›
,
2
1โˆ’||๐‘ฅ||2
2
๐‘”)๐‘” = ๐‘‘๐‘ฅ1 2
+ โ‹ฏ + ๐‘‘๐‘ฅ ๐‘› 2
Euclidean Hyperbolic
(โ„ ๐‘›, ๐‘”)
๐‘€ = โ„ ๐‘›
Metric tensor : inner product on tangent space
= ๐‘‘๐‘ฅ1 ๐‘ข ๐‘‘๐‘ฅ1 ๐‘ฃ + โ‹ฏ + ๐‘‘๐‘ฅ ๐‘› ๐‘ข ๐‘‘๐‘ฅ ๐‘›(๐‘ฃ)
= ๐‘ข1 ๐‘ฃ1 + โ‹ฏ + ๐‘ข ๐‘› ๐‘ฃ ๐‘›
โˆ€ ๐‘ข, ๐‘ฃ โˆˆ ๐‘‡๐‘โ„ ๐‘›
where ๐‘ โˆˆ โ„ ๐‘›
๐‘ข, ๐‘ฃ ๐‘ = ๐‘ข ๐‘ก ๐‘”๐‘ฃ
=
2
1 โˆ’ ||๐‘||2
2
(๐‘ข1 ๐‘ฃ1 + โ‹ฏ + ๐‘ข ๐‘› ๐‘ฃ ๐‘›)
โˆ€ ๐‘ข, ๐‘ฃ โˆˆ ๐‘‡๐‘ ๐ท ๐‘›
where ๐‘ โˆˆ ๐ท ๐‘›
๐‘ข, ๐‘ฃ ๐‘ = ๐‘ข ๐‘ก
(
2
1 โˆ’ ||๐‘||2
2
๐‘”)๐‘ฃ
Give Riemannian Metric
Euclidean Space vs Hyperbolic space
9
Inner product โŸจ โ‹… , โ‹… โŸฉ ๐‘ in ๐‘‡๐‘ ๐ท ๐‘› defines
Length of ๐›พ: 0,1 โ†’ ๐ท ๐‘› ๏ƒ  ๐ฟ ๐›พ = 0
1
๐›พ๐‘ก
โ€ฒ
, ๐›พ๐‘ก
โ€ฒ
๐›พ๐‘ก
1/2
๐‘‘๐‘ก
Angle between ๐‘ค1, ๐‘ค2 โˆˆ ๐‘‡๐‘ ๐ท ๐‘›
๏ƒ 
๐‘ค1,๐‘ค2 ๐‘
๐‘ค1,๐‘ค1 ๐‘โ‹… ๐‘ค2,๐‘ค2 ๐‘
1/2
Line between ๐‘, ๐‘ž โˆˆ ๐‘€ is the shortest path between them
๐›พโˆ—
= ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘›
0
1
๐›พ๐‘ก
โ€ฒ
, ๐›พ๐‘ก
โ€ฒ
๐›พ๐‘ก
1/2
๐‘‘๐‘ก
๐›พ0 = ๐‘, ๐›พ1 = ๐‘ž
Euclidean Hyperbolic
๐‘ž
๐‘
๐‘ž
๐‘
2
1 โˆ’ ||๐‘ฅ||2
2
๐‘”
โ†’ โˆž ๐‘Ž๐‘  |๐‘ฅ| โ†’ 1
Equivalent Hyperbolic Models
10
We can choose one of Hyperbolic Models depending on purpose.
๐ท ๐‘›
= {๐‘ฅ โˆˆ โ„ ๐‘›
โˆถ ๐‘ฅ1
2
+ โ‹ฏ + ๐‘ฅ ๐‘›
2
< 1}
(๐ท ๐‘›,
2
1โˆ’||๐‘ฅ||2
2
๐‘‘๐‘ฅ1 2 + โ‹ฏ + ๐‘‘๐‘ฅ ๐‘› 2)
(๐‘ฅ0, โ€ฆ , ๐‘ฅ ๐‘›)
๏ƒ  For visualization ๏ƒ  For optimization
(
๐‘ฅ1
1 + ๐‘ฅ0
, โ€ฆ ,
๐‘ฅ ๐‘›
1 + ๐‘ฅ0
)
Poincare Model Lorentz Model
(โ„’ ๐‘›
, โˆ’๐‘‘๐‘ฅ0 2
+ ๐‘‘๐‘ฅ1 2
โ€ฆ + ๐‘‘๐‘ฅ ๐‘› 2
)
ISOMETRIC
Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry (ICML 2018)
11
Optimization Techniques
Suggested loss function
12
A Example of loss function over hyperbolic space.
Fundamentally, gradients of loss tells which direction the points
should proceed.
Poincarรฉ Embeddings for Learning Hierarchical Representations (ICML 2017)
Gradient Descent Algorithm
13
Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0
repeat
choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜
๐ฟ2
choose a retraction ๐‘… ๐‘ ๐‘˜
: ๐‘‡๐‘ ๐‘˜
๐ฟ2
โ†’ ๐ฟ2
choose a step length ๐›ผ ๐‘˜ โˆˆ โ„
set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜
(๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜)
๐‘˜ โ† ๐‘˜ + 1
until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“
Nothing different from usual gradient descent except for
Gradient direction
Retraction
Optimization methods on Riemannian manifolds and their application to shape space (SIAM 2012)
Gradient Descent Algorithm
14
Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0
repeat
choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜
๐ฟ2
choose a retraction ๐‘… ๐‘ ๐‘˜
: ๐‘‡๐‘ ๐‘˜
๐ฟ2
โ†’ ๐ฟ2
choose a step length ๐›ผ ๐‘˜ โˆˆ โ„
set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜
(๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜)
๐‘˜ โ† ๐‘˜ + 1
until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“
What is the gradient on Hyperbolic space?
๐‘“ โˆถ (โ„’2
, โˆ’๐‘‘๐‘ฅ0 2
+ ๐‘‘๐‘ฅ1 2
+ ๐‘‘๐‘ฅ ๐‘› 2
) โ†’ โ„
โˆ‡๐‘“ ?
Hyperboloid model
15
First, find ๐›ปโ„2:1 ๐‘“| ๐‘ โˆˆ โ„3
๐‘ . ๐‘ก. ๐›ปโ„2:1 ๐‘“| ๐‘, ๐‘ฃ
โ„’
= ๐‘‘๐‘“ ๐‘ฃ | ๐‘.
Second, project ๐›ปโ„2:1 ๐‘“| ๐‘ into ๐‘‡๐‘ ๐ฟ2.
๐›ป๐ฟ2 ๐‘“| ๐‘ = ๐›ปโ„2:1 ๐‘“| ๐‘ + ๐›ปโ„2:1 ๐‘“| ๐‘, ๐‘
โ„’
๐‘
๐‘‡๐‘ ๐ฟ2
= {๐‘ฃ โˆˆ โ„3
โˆถ ๐‘ฃ, ๐‘ โ„’ = 0}.
๐ฟ2 = {๐‘ โˆˆ โ„3: ๐‘, ๐‘ โ„’ = โˆ’1, ๐‘ ๐‘ง > 0}.
๐‘“ โˆถ (โ„’2, โˆ’๐‘‘๐‘ฅ0 2 + ๐‘‘๐‘ฅ1 2 + ๐‘‘๐‘ฅ2 2) โ†’ โ„
๐›ปโ„2:1 ๐‘“| ๐‘ = (โˆ’๐‘‘๐‘ฅ0 2 + ๐‘‘๐‘ฅ1 2 + ๐‘‘๐‘ฅ ๐‘› 2)โˆ’1 โ‹… Usual derivative
(from tensorflow)
โˆ’๐‘ฃ ๐‘˜
Gradient descent in hyperbolic space (Arxiv 2018)
Gradient Descent Algorithm
16
Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0
repeat
choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜
๐ฟ2
choose a retraction ๐‘… ๐‘ ๐‘˜
: ๐‘‡๐‘ ๐‘˜
๐ฟ2
โ†’ ๐ฟ2
choose a step length ๐›ผ ๐‘˜ โˆˆ โ„
set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜
(๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜)
๐‘˜ โ† ๐‘˜ + 1
until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“
What is the retraction on Hyperbolic space?
Hyperboloid model
17
Retraction tells how ends points of tangent vectors correspond
to the point on manifold.
We chose affine geodesic as retraction
๐›พ๐‘ก = cosh ||๐‘ฃ||โ„’ ๐‘ก ๐‘ + sinh ||๐‘ฃ||โ„’ ๐‘ก
๐‘ฃ
||๐‘ฃ||โ„’
๐‘žโ€ฒ โˆ‰ ๐ฟ2
๐‘…(๐‘žโ€ฒ
) โˆˆ ๐ฟ2
At ๐‘ โˆˆ ๐ฟ2 with direction ๐‘ฃ โˆˆ ๐‘‡๐‘ ๐ฟ2
Gradient Descent Algorithm
18
Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0
repeat
choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜
๐ฟ2
choose a retraction ๐‘… ๐‘ ๐‘˜
: ๐‘‡๐‘ ๐‘˜
๐ฟ2
โ†’ ๐ฟ2
choose a step length ๐›ผ ๐‘˜ โˆˆ โ„
set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜
(๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜)
๐‘˜ โ† ๐‘˜ + 1
until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“
The next point becomes
๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜
(๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜)
= cosh ||๐‘ฃ ๐‘˜||โ„’ ๐›ผ ๐‘˜ ๐‘ ๐‘˜ + sinh ||๐‘ฃ ๐‘˜||โ„’ ๐›ผ ๐‘˜
๐‘ฃ ๐‘˜
||๐‘ฃ ๐‘˜||โ„’
Simple Optimization Task1
19
GD with gradients GD with R-gradients R-GD with R-gradients
๐‘๐‘ก = ๐‘๐‘กโˆ’1 โˆ’ ๐›ผ โ‹… ๐›ป๐ธ ๐ฟ(๐‘๐‘กโˆ’1) ๐‘๐‘ก = ๐‘๐‘กโˆ’1 โˆ’ ๐›ผ โ‹… ๐›ป๐‘… ๐ฟ(๐‘๐‘กโˆ’1)
๐‘๐‘ก = ๐›พ ๐›ผ
๐›พ0 = ๐‘๐‘กโˆ’1 ๐›พ0
โ€ฒ
= ๐›ป๐‘… ๐ฟ(๐‘๐‘กโˆ’1)
3.3024998, 4.7424998,
4.7859879, 4.8213577,
4.851644, 4.8784704,
4.9028177, 4.9253302
3.3024998, 3.3081245,
3.3175893, 3.3334663,
3.3599658, 3.403821,
3.4753809, 3.5894651
3.3024998, 3.3025002,
3.3025002, 3.3025002,
3.3025005, 3.3025,
3.3025002, 3.3025005
Simple Optimization Task2
20
๐ฟ(๐‘) =
๐‘–
๐‘‘ ๐ฟ2 ๐‘, ๐‘ฅ๐‘–
2
โ€œBarycenterโ€ can be found by minimizing
Simple Optimization Task2
21
Simple Optimization Task2
22
๐ฟ(๐‘) =
๐‘–
๐‘‘ ๐ฟ2 ๐‘, ๐‘ฅ๐‘–
2
โ€œBarycenterโ€
can be found by minimizing
Takeaways
23
Hyperbolic space is promising to represent symbolic and
hierarchical datasets.
Geometry determines path toward optimal points.
Regardless of optimization technique, the optimal point is only
depends on loss function.
Interpretation: Can the path entail semantics?
Loss function over hyperbolic space should be discreetly
chosen.
Is it suitable for given geometry? Differentiable? / operation?
Unfortunately, we loose simple arithmetic.

More Related Content

What's hot

Power point vector
Power point vectorPower point vector
Power point vector
Emmanuel Alipar
ย 
Prestation_ClydeShen
Prestation_ClydeShenPrestation_ClydeShen
Prestation_ClydeShen
Clyde Shen
ย 
Fractional Calculus A Commutative Method on Real Analytic Functions
Fractional Calculus A Commutative Method on Real Analytic FunctionsFractional Calculus A Commutative Method on Real Analytic Functions
Fractional Calculus A Commutative Method on Real Analytic Functions
Matt Parker
ย 
chapter-8.ppt
chapter-8.pptchapter-8.ppt
chapter-8.ppt
Tareq Hasan
ย 
Triangle law of vector addition
Triangle law of vector additionTriangle law of vector addition
Triangle law of vector addition
Lauragibbo1
ย 

What's hot (20)

It elective-4-buendia lagua
It elective-4-buendia laguaIt elective-4-buendia lagua
It elective-4-buendia lagua
ย 
Question 5 Math 1
Question 5 Math 1Question 5 Math 1
Question 5 Math 1
ย 
Question 4 Math 1
Question 4 Math 1Question 4 Math 1
Question 4 Math 1
ย 
Dijkstra's Algorithm
Dijkstra's Algorithm Dijkstra's Algorithm
Dijkstra's Algorithm
ย 
Power point vector
Power point vectorPower point vector
Power point vector
ย 
Chapter 4 Integration
Chapter 4  IntegrationChapter 4  Integration
Chapter 4 Integration
ย 
Prestation_ClydeShen
Prestation_ClydeShenPrestation_ClydeShen
Prestation_ClydeShen
ย 
2D transformation (Computer Graphics)
2D transformation (Computer Graphics)2D transformation (Computer Graphics)
2D transformation (Computer Graphics)
ย 
Shortest path search for real road networks and dynamic costs with pgRouting
Shortest path search for real road networks and dynamic costs with pgRoutingShortest path search for real road networks and dynamic costs with pgRouting
Shortest path search for real road networks and dynamic costs with pgRouting
ย 
Svm soft margin hyperplanes
Svm   soft margin hyperplanesSvm   soft margin hyperplanes
Svm soft margin hyperplanes
ย 
Parallel tansport sssqrd
Parallel tansport sssqrdParallel tansport sssqrd
Parallel tansport sssqrd
ย 
Unit 6.1
Unit 6.1Unit 6.1
Unit 6.1
ย 
Fractional Calculus A Commutative Method on Real Analytic Functions
Fractional Calculus A Commutative Method on Real Analytic FunctionsFractional Calculus A Commutative Method on Real Analytic Functions
Fractional Calculus A Commutative Method on Real Analytic Functions
ย 
chapter-8.ppt
chapter-8.pptchapter-8.ppt
chapter-8.ppt
ย 
NUMERICAL INTEGRATION : ERROR FORMULA, GAUSSIAN QUADRATURE FORMULA
NUMERICAL INTEGRATION : ERROR FORMULA, GAUSSIAN QUADRATURE FORMULANUMERICAL INTEGRATION : ERROR FORMULA, GAUSSIAN QUADRATURE FORMULA
NUMERICAL INTEGRATION : ERROR FORMULA, GAUSSIAN QUADRATURE FORMULA
ย 
CG 2D Transformation
CG 2D TransformationCG 2D Transformation
CG 2D Transformation
ย 
Triangle law of vector addition
Triangle law of vector additionTriangle law of vector addition
Triangle law of vector addition
ย 
3d Projection
3d Projection3d Projection
3d Projection
ย 
Relaxation method
Relaxation methodRelaxation method
Relaxation method
ย 
Scalars & vectors
Scalars & vectorsScalars & vectors
Scalars & vectors
ย 

Similar to 20180831 riemannian representation learning

"Incremental Lossless Graph Summarization", KDD 2020
"Incremental Lossless Graph Summarization", KDD 2020"Incremental Lossless Graph Summarization", KDD 2020
"Incremental Lossless Graph Summarization", KDD 2020
์ง€ํ›ˆ ๊ณ 
ย 

Similar to 20180831 riemannian representation learning (20)

Icra 17
Icra 17Icra 17
Icra 17
ย 
Concepts and Applications of the Fundamental Theorem of Line Integrals.pdf
Concepts and Applications of the Fundamental Theorem of Line Integrals.pdfConcepts and Applications of the Fundamental Theorem of Line Integrals.pdf
Concepts and Applications of the Fundamental Theorem of Line Integrals.pdf
ย 
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
A Probabilistic Algorithm for Computation of Polynomial Greatest Common with ...
ย 
A Non Local Boundary Value Problem with Integral Boundary Condition
A Non Local Boundary Value Problem with Integral Boundary ConditionA Non Local Boundary Value Problem with Integral Boundary Condition
A Non Local Boundary Value Problem with Integral Boundary Condition
ย 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
ย 
IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's ClassesIIT JAM Math 2022 Question Paper | Sourav Sir's Classes
IIT JAM Math 2022 Question Paper | Sourav Sir's Classes
ย 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
ย 
Fixed Point Results for Weakly Compatible Mappings in Convex G-Metric Space
Fixed Point Results for Weakly Compatible Mappings in Convex G-Metric SpaceFixed Point Results for Weakly Compatible Mappings in Convex G-Metric Space
Fixed Point Results for Weakly Compatible Mappings in Convex G-Metric Space
ย 
Complex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptxComplex differentiation contains analytic function.pptx
Complex differentiation contains analytic function.pptx
ย 
Machine learning introduction lecture notes
Machine learning introduction lecture notesMachine learning introduction lecture notes
Machine learning introduction lecture notes
ย 
Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence Spaces
ย 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
ย 
Lesson 3: Problem Set 4
Lesson 3: Problem Set 4Lesson 3: Problem Set 4
Lesson 3: Problem Set 4
ย 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
ย 
Design of Second Order Digital Differentiator and Integrator Using Forward Di...
Design of Second Order Digital Differentiator and Integrator Using Forward Di...Design of Second Order Digital Differentiator and Integrator Using Forward Di...
Design of Second Order Digital Differentiator and Integrator Using Forward Di...
ย 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
ย 
A05330107
A05330107A05330107
A05330107
ย 
"Incremental Lossless Graph Summarization", KDD 2020
"Incremental Lossless Graph Summarization", KDD 2020"Incremental Lossless Graph Summarization", KDD 2020
"Incremental Lossless Graph Summarization", KDD 2020
ย 
Btech_II_ engineering mathematics_unit5
Btech_II_ engineering mathematics_unit5Btech_II_ engineering mathematics_unit5
Btech_II_ engineering mathematics_unit5
ย 
CVRP solver with Multi-Head Attention
CVRP solver with Multi-Head AttentionCVRP solver with Multi-Head Attention
CVRP solver with Multi-Head Attention
ย 

Recently uploaded

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
ย 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
ย 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
ย 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
ย 

Recently uploaded (20)

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
ย 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
ย 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
ย 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
ย 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
ย 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
ย 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
ย 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
ย 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
ย 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
ย 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
ย 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
ย 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
ย 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
ย 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
ย 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
ย 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
ย 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
ย 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
ย 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
ย 

20180831 riemannian representation learning

  • 1. Hierarchical representation with hyperbolic geometry 2016-20873 Segwang Kim 1
  • 2. โ‘  Embedding Symbolic and Hierarchical Data โ‘ก Introduction to Hyperbolic Space โ‘ข Optimization over Hyperbolic Space โ‘ฃ Toy Experiments Overview 2
  • 3. 3 Embedding Symbolic and Hierarchical Data
  • 4. Symbolic and Hierarchical Data 4 Symbolic data with Implicit hierarchy. Downstream tasks link prediction, node classification, community detection, visualization Wordnet Twitter Social Graph ?LINK community
  • 5. Good Hierarchical Embedding 5 For downstream tasks, symbolic and hierarchical data needs to be embedded into space. Good Embedding? Embeddings of similar symbols should aggregate in some sense. Symbolic arithmetic exists: v(King)- v(man) + v(woman)=v(Queen) Hierarchy can be restored from embedded data. The space should have low dimension.
  • 7. Limitation of Euclidean Embedding 7 Embed graph structure while preserving distances Thm) Trees cannot be embedded into Euclidean space with arbitrarily low distortion for any number of dimensions a b Graph Euclidean ?? D(a,b) 2 0.1 1.889 D(a,c) 2 1 1.902 D(a,d) 2 1.8 1.962 Euclidean Graph ?? c d a b c d a b c d Embedding Representation tradeoffs for hyperbolic Embeddings (ICML 2018)
  • 8. Euclidean Space vs Hyperbolic space 8 ๐‘€ = ๐ท ๐‘› = {๐‘ฅ โˆˆ โ„ ๐‘› โˆถ ๐‘ฅ1 2 + โ‹ฏ + ๐‘ฅ ๐‘› 2 < 1} (๐ท ๐‘› , 2 1โˆ’||๐‘ฅ||2 2 ๐‘”)๐‘” = ๐‘‘๐‘ฅ1 2 + โ‹ฏ + ๐‘‘๐‘ฅ ๐‘› 2 Euclidean Hyperbolic (โ„ ๐‘›, ๐‘”) ๐‘€ = โ„ ๐‘› Metric tensor : inner product on tangent space = ๐‘‘๐‘ฅ1 ๐‘ข ๐‘‘๐‘ฅ1 ๐‘ฃ + โ‹ฏ + ๐‘‘๐‘ฅ ๐‘› ๐‘ข ๐‘‘๐‘ฅ ๐‘›(๐‘ฃ) = ๐‘ข1 ๐‘ฃ1 + โ‹ฏ + ๐‘ข ๐‘› ๐‘ฃ ๐‘› โˆ€ ๐‘ข, ๐‘ฃ โˆˆ ๐‘‡๐‘โ„ ๐‘› where ๐‘ โˆˆ โ„ ๐‘› ๐‘ข, ๐‘ฃ ๐‘ = ๐‘ข ๐‘ก ๐‘”๐‘ฃ = 2 1 โˆ’ ||๐‘||2 2 (๐‘ข1 ๐‘ฃ1 + โ‹ฏ + ๐‘ข ๐‘› ๐‘ฃ ๐‘›) โˆ€ ๐‘ข, ๐‘ฃ โˆˆ ๐‘‡๐‘ ๐ท ๐‘› where ๐‘ โˆˆ ๐ท ๐‘› ๐‘ข, ๐‘ฃ ๐‘ = ๐‘ข ๐‘ก ( 2 1 โˆ’ ||๐‘||2 2 ๐‘”)๐‘ฃ Give Riemannian Metric
  • 9. Euclidean Space vs Hyperbolic space 9 Inner product โŸจ โ‹… , โ‹… โŸฉ ๐‘ in ๐‘‡๐‘ ๐ท ๐‘› defines Length of ๐›พ: 0,1 โ†’ ๐ท ๐‘› ๏ƒ  ๐ฟ ๐›พ = 0 1 ๐›พ๐‘ก โ€ฒ , ๐›พ๐‘ก โ€ฒ ๐›พ๐‘ก 1/2 ๐‘‘๐‘ก Angle between ๐‘ค1, ๐‘ค2 โˆˆ ๐‘‡๐‘ ๐ท ๐‘› ๏ƒ  ๐‘ค1,๐‘ค2 ๐‘ ๐‘ค1,๐‘ค1 ๐‘โ‹… ๐‘ค2,๐‘ค2 ๐‘ 1/2 Line between ๐‘, ๐‘ž โˆˆ ๐‘€ is the shortest path between them ๐›พโˆ— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘› 0 1 ๐›พ๐‘ก โ€ฒ , ๐›พ๐‘ก โ€ฒ ๐›พ๐‘ก 1/2 ๐‘‘๐‘ก ๐›พ0 = ๐‘, ๐›พ1 = ๐‘ž Euclidean Hyperbolic ๐‘ž ๐‘ ๐‘ž ๐‘ 2 1 โˆ’ ||๐‘ฅ||2 2 ๐‘” โ†’ โˆž ๐‘Ž๐‘  |๐‘ฅ| โ†’ 1
  • 10. Equivalent Hyperbolic Models 10 We can choose one of Hyperbolic Models depending on purpose. ๐ท ๐‘› = {๐‘ฅ โˆˆ โ„ ๐‘› โˆถ ๐‘ฅ1 2 + โ‹ฏ + ๐‘ฅ ๐‘› 2 < 1} (๐ท ๐‘›, 2 1โˆ’||๐‘ฅ||2 2 ๐‘‘๐‘ฅ1 2 + โ‹ฏ + ๐‘‘๐‘ฅ ๐‘› 2) (๐‘ฅ0, โ€ฆ , ๐‘ฅ ๐‘›) ๏ƒ  For visualization ๏ƒ  For optimization ( ๐‘ฅ1 1 + ๐‘ฅ0 , โ€ฆ , ๐‘ฅ ๐‘› 1 + ๐‘ฅ0 ) Poincare Model Lorentz Model (โ„’ ๐‘› , โˆ’๐‘‘๐‘ฅ0 2 + ๐‘‘๐‘ฅ1 2 โ€ฆ + ๐‘‘๐‘ฅ ๐‘› 2 ) ISOMETRIC Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry (ICML 2018)
  • 12. Suggested loss function 12 A Example of loss function over hyperbolic space. Fundamentally, gradients of loss tells which direction the points should proceed. Poincarรฉ Embeddings for Learning Hierarchical Representations (ICML 2017)
  • 13. Gradient Descent Algorithm 13 Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0 repeat choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜ ๐ฟ2 choose a retraction ๐‘… ๐‘ ๐‘˜ : ๐‘‡๐‘ ๐‘˜ ๐ฟ2 โ†’ ๐ฟ2 choose a step length ๐›ผ ๐‘˜ โˆˆ โ„ set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜ (๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜) ๐‘˜ โ† ๐‘˜ + 1 until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“ Nothing different from usual gradient descent except for Gradient direction Retraction Optimization methods on Riemannian manifolds and their application to shape space (SIAM 2012)
  • 14. Gradient Descent Algorithm 14 Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0 repeat choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜ ๐ฟ2 choose a retraction ๐‘… ๐‘ ๐‘˜ : ๐‘‡๐‘ ๐‘˜ ๐ฟ2 โ†’ ๐ฟ2 choose a step length ๐›ผ ๐‘˜ โˆˆ โ„ set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜ (๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜) ๐‘˜ โ† ๐‘˜ + 1 until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“ What is the gradient on Hyperbolic space? ๐‘“ โˆถ (โ„’2 , โˆ’๐‘‘๐‘ฅ0 2 + ๐‘‘๐‘ฅ1 2 + ๐‘‘๐‘ฅ ๐‘› 2 ) โ†’ โ„ โˆ‡๐‘“ ?
  • 15. Hyperboloid model 15 First, find ๐›ปโ„2:1 ๐‘“| ๐‘ โˆˆ โ„3 ๐‘ . ๐‘ก. ๐›ปโ„2:1 ๐‘“| ๐‘, ๐‘ฃ โ„’ = ๐‘‘๐‘“ ๐‘ฃ | ๐‘. Second, project ๐›ปโ„2:1 ๐‘“| ๐‘ into ๐‘‡๐‘ ๐ฟ2. ๐›ป๐ฟ2 ๐‘“| ๐‘ = ๐›ปโ„2:1 ๐‘“| ๐‘ + ๐›ปโ„2:1 ๐‘“| ๐‘, ๐‘ โ„’ ๐‘ ๐‘‡๐‘ ๐ฟ2 = {๐‘ฃ โˆˆ โ„3 โˆถ ๐‘ฃ, ๐‘ โ„’ = 0}. ๐ฟ2 = {๐‘ โˆˆ โ„3: ๐‘, ๐‘ โ„’ = โˆ’1, ๐‘ ๐‘ง > 0}. ๐‘“ โˆถ (โ„’2, โˆ’๐‘‘๐‘ฅ0 2 + ๐‘‘๐‘ฅ1 2 + ๐‘‘๐‘ฅ2 2) โ†’ โ„ ๐›ปโ„2:1 ๐‘“| ๐‘ = (โˆ’๐‘‘๐‘ฅ0 2 + ๐‘‘๐‘ฅ1 2 + ๐‘‘๐‘ฅ ๐‘› 2)โˆ’1 โ‹… Usual derivative (from tensorflow) โˆ’๐‘ฃ ๐‘˜ Gradient descent in hyperbolic space (Arxiv 2018)
  • 16. Gradient Descent Algorithm 16 Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0 repeat choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜ ๐ฟ2 choose a retraction ๐‘… ๐‘ ๐‘˜ : ๐‘‡๐‘ ๐‘˜ ๐ฟ2 โ†’ ๐ฟ2 choose a step length ๐›ผ ๐‘˜ โˆˆ โ„ set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜ (๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜) ๐‘˜ โ† ๐‘˜ + 1 until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“ What is the retraction on Hyperbolic space?
  • 17. Hyperboloid model 17 Retraction tells how ends points of tangent vectors correspond to the point on manifold. We chose affine geodesic as retraction ๐›พ๐‘ก = cosh ||๐‘ฃ||โ„’ ๐‘ก ๐‘ + sinh ||๐‘ฃ||โ„’ ๐‘ก ๐‘ฃ ||๐‘ฃ||โ„’ ๐‘žโ€ฒ โˆ‰ ๐ฟ2 ๐‘…(๐‘žโ€ฒ ) โˆˆ ๐ฟ2 At ๐‘ โˆˆ ๐ฟ2 with direction ๐‘ฃ โˆˆ ๐‘‡๐‘ ๐ฟ2
  • 18. Gradient Descent Algorithm 18 Input: ๐‘“: ๐ฟ2 โ†’ โ„, ๐‘0 โˆˆ ๐ฟ2, ๐‘˜ = 0 repeat choose a descent direction ๐‘ฃ ๐‘˜ โˆˆ ๐‘‡๐‘ ๐‘˜ ๐ฟ2 choose a retraction ๐‘… ๐‘ ๐‘˜ : ๐‘‡๐‘ ๐‘˜ ๐ฟ2 โ†’ ๐ฟ2 choose a step length ๐›ผ ๐‘˜ โˆˆ โ„ set ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜ (๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜) ๐‘˜ โ† ๐‘˜ + 1 until ๐‘ ๐‘˜+1 sufficiently minimize ๐‘“ The next point becomes ๐‘ ๐‘˜+1 = ๐‘… ๐‘ ๐‘˜ (๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜) = cosh ||๐‘ฃ ๐‘˜||โ„’ ๐›ผ ๐‘˜ ๐‘ ๐‘˜ + sinh ||๐‘ฃ ๐‘˜||โ„’ ๐›ผ ๐‘˜ ๐‘ฃ ๐‘˜ ||๐‘ฃ ๐‘˜||โ„’
  • 19. Simple Optimization Task1 19 GD with gradients GD with R-gradients R-GD with R-gradients ๐‘๐‘ก = ๐‘๐‘กโˆ’1 โˆ’ ๐›ผ โ‹… ๐›ป๐ธ ๐ฟ(๐‘๐‘กโˆ’1) ๐‘๐‘ก = ๐‘๐‘กโˆ’1 โˆ’ ๐›ผ โ‹… ๐›ป๐‘… ๐ฟ(๐‘๐‘กโˆ’1) ๐‘๐‘ก = ๐›พ ๐›ผ ๐›พ0 = ๐‘๐‘กโˆ’1 ๐›พ0 โ€ฒ = ๐›ป๐‘… ๐ฟ(๐‘๐‘กโˆ’1) 3.3024998, 4.7424998, 4.7859879, 4.8213577, 4.851644, 4.8784704, 4.9028177, 4.9253302 3.3024998, 3.3081245, 3.3175893, 3.3334663, 3.3599658, 3.403821, 3.4753809, 3.5894651 3.3024998, 3.3025002, 3.3025002, 3.3025002, 3.3025005, 3.3025, 3.3025002, 3.3025005
  • 20. Simple Optimization Task2 20 ๐ฟ(๐‘) = ๐‘– ๐‘‘ ๐ฟ2 ๐‘, ๐‘ฅ๐‘– 2 โ€œBarycenterโ€ can be found by minimizing
  • 22. Simple Optimization Task2 22 ๐ฟ(๐‘) = ๐‘– ๐‘‘ ๐ฟ2 ๐‘, ๐‘ฅ๐‘– 2 โ€œBarycenterโ€ can be found by minimizing
  • 23. Takeaways 23 Hyperbolic space is promising to represent symbolic and hierarchical datasets. Geometry determines path toward optimal points. Regardless of optimization technique, the optimal point is only depends on loss function. Interpretation: Can the path entail semantics? Loss function over hyperbolic space should be discreetly chosen. Is it suitable for given geometry? Differentiable? / operation? Unfortunately, we loose simple arithmetic.

Editor's Notes

  1. Good evening. I am segwang kim from machine intelligence lab. My topic is Hierarchical representation with hyperbolic geometry. This topic is the topic I am currently working on, but I have gotten nothing meaningful yet. I found this topic intriguing in that it suggests alternative ways to represent symbolic and hierarchical datasets, which in turns helps to do downstream tasks in Natural Language Processing or Social Network Analysis.
  2. This is an overview. The main goal of this talk is to make you get along with hyperbolic representation. First, I will introduce the data of interest to be represented and conventional way to embed those datasets. Second, I will go over shortcomings of conventional embedding and introduce the gist of hyperbolic space. Third, I am gonna show optimization technique over hyperbolic space. In the end, Toy Experiments are followed. Recent papers are included in this presentation.
  3. The datasets I am dealing with, such as wordnet or social network are symbolic and hierarchical. They are symbolic because words or users have no meaningful numeric values. They are just symbols. On top of that, they are hierarchical since there exist partial orderings between data points like dogs belong to mammals and mammals belong to animal. Or, when a twitter user follows another, then we can have ordering between them. The typical machine learning problem on those datasets are link prediction, node classification, community detection or visualization. To be specific, someone would ask are sprinkler and birdcage linked? or what community does a particular user belong to?
  4. To tackle those problems, we need to parametrize symbolic and hierarchical dataset into numeric forms. We call this process as embedding. Once data points are embedded into some space, we can apply a machine learning model that work on the space. Even if symbolic datapoints are represented in numerical form, it is natural to expect that the embedding should agree on our intuition. For instance, two words with similar meaning should be represented as two points that are close to each other. This two-dimensional figure seems to catch semantic relation. Like this, we expect some properties from good embedding. Down the ages, we have embedded symbolic data into the most familiar space, Euclidean space.
  5. However, there are some limitations of Euclidean Embedding. To illustrate, assume that we want to solve machine learning problem on this bushy-structured datasets. Edge between two nodes means they have something in common. Therefore, we would want to find the embedding that preserves distances among nodes measured in the graph. Unfortunately, a second you embed the data points into two dimensional Euclidean space, you would realize that the huge distortions have been made. While the graph distance between node a and b is 2, the Euclidean distance between corresponding points is far less than 2. To remedy this problem, researchers have increased the dimensionality of Euclidean space. However, by doing that, we loose opportunities to analyze it low dimension. On top of that, trying to embed trees into Euclidean space is wrong from the beginning. To be more formally, there is a theorem that Trees cannot be โ€ฆ. So, main question is, what if we have a space that preserve graph structure well like this one? What is this mysterious space? Now, itโ€™s time to introduce hyperbolic space.
  6. Time for series of math slides. The best analogy I can use for introducing hyperbolic space is Euclidean space. We can define geometry of given space or manifold by looking into its domain and inner product structure on tangent space. Before elaborating why inner product structure does matter, letโ€™s formally define hyperbolic space. Hyperbolic space is a manifold with constant sectional curvature -1 and five different models are used for describing it. Actually they are same because there exists isometries among them. Anyhow, I pick one of them. A Poicare disk model. The domain of N-dimensional poicare disk model is N-dimensional sphere. A innerproduct of tangent space is defined like this. Unlike Euclidean space which has the same innerprdocut rule for all tangent space, hyperbolic space has different innerproduct structure depending on which point given tangent space is attached. In mathetmatical term, this is called Riemannian metric. To compare these two spaces, letโ€™s do an inner product. First you attach tangent plane to given point p in Euclidean or hyperbolic space and then, you pick two arbitrary tangent vectors from the tangent plane. In case of Euclidean product, you take component-wise product and do summation. Note that the point p has nothing to do with computing inner product. However, in case of hyperbolic space, this highlighted term is multiplied after usual inner product. Note that it depends on point p. Because of this term, strange things are happened.
  7. As I said, inner product of tangent space governs geometry of space. Because, it defines length, angle and โ€œlineโ€ of given space. From the calculus 101, we know that length of given path is defined as line integral of norm of instantaneous velocity, which is tangent vector. Since norm is defined when inner product is given, the Riemannian Metric comes into play. Also, angle between two tangent vectors is governed by innerproduct structure because inner products need to be done. Finally, if we keep in mind that line is not defined as straight path but the shortest path connecting starting and end points, shape of line in hyperbolic space must be different. The shortest path is the optimal solution of this functional equation which seems almost impossible to solve. But Mathematician concludes that line in hyperbolic space is either an usual arc which perpendicularly intersects with boundary of n-dimensional sphere or straight line starting from the center. Considering the norm of tangent vector increases as base point goes to boundary, the shortest path must be inclined to pass region around center rather than near boundary. So it must be tilted toward center.
  8. One interesting fact about hyperbolic space is we can choose a one model among five ones depending on situation. Fundamentally, they are all same because of existence of isometry. The paper โ€œ โ€œ suggest that Poincare ball model is more adequate for visualization than Lorentz model, defined like this. This is because Lorentz model is defined on ambient space with constraints. But Lorentz model guarantees more computational stability of gradient than Poincare ball model. In the following optimization section, I will explain optimization technique on Lorentz model not Poincare model.
  9. This is one example of loss function over hyperbolic space. As you can see, this loss function has hyperbolic distance terms. Details are omitted, but basically, this disperses irrelevant datapoints and aggregates relevant ones. Because gradients of loss tells which direction the datapoints should proceed, we need to know how to compute derivative of given loss function.
  10. This is Riemannian Gradient descent algorithm. There are only two parts you need to focus on. First, choosing a descent direction, second choosing a retraction
  11. Choosing a descent direction needs more a little bit of efforts than usual gradient. Letโ€™s assume that we want to minimize a loss function over two-dimensional Lorentz model. Basically, we want to find gradient of f.
  12. It takes two steps. Basically, we need to correspond naรฏve gradients to a tangent vector. First, once we get a gradient from tensorflow or any api, as shown in blue box, this value is unique no matter which metric tensor you have chosen. If we interpret gradient as linear mapping from tangent space to real number, Riesz representation theorem implies that there is a corresponding vector such that inner product with the vector is the gradient map. To find the vector, inverse of metric tensor needs to be multiplied to usual derivatives in order to compensates extra terms in hyperbolic innerproduct. It is complicated but, bottom line is just flip the sign of the first element of usual gradient. The second step is projection. Because Lorentz model is defined in ambient space, we need to project the resulting vector from the first step to tangent place of model. It only takes some multiplication and addition. Therefore, we can get Riemannian descent direction by flipping signs of all components of hyperbolic gradient of the loss.
  13. Retraction tells how can a point be moved to given direction. When the point is moved to the tip of the direction, it escapes a manifold. This is sad.
  14. However, the point is moved to the tip of the geodesics, then it stays on the manifold and we are happy. The geodesic is a hyperbolic version of line and this simple formula is all you need.
  15. The last step is trivial. We just need to iterate previous steps until we get sufficiently small errors.