The document discusses challenges in translating mathematical formulae between different representation systems like LaTeX and computer algebra systems. It proposes a multiple-scan approach to understand formulae by combining pattern recognition of the formula structure with analysis of the surrounding context from both near and far fields, similar to how human readers understand formulae. This approach aims to determine the meaning and valid translations of a formula when only presented in a generic LaTeX format without semantic tags.
Automatic Mathematical Information Retrieval to Perform Translations up to Computer Algebra Systems
1. Automatic Mathematical Information
Retrieval to Perform Translations up to
Computer Algebra Systems
André Greiner-Petter*
June 6, 2018
University of Konstanz
Germany
*sponsored by SIGIR Student Travel Grant @GreinerPetter 1/9
8. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP(n, alpha, beta, cos(a*Theta))
Potential Problems:
• Differences in syntax
• Function is not implemented in one system,
• Function has multiple representations in one system,
• Differences in definitions.
3/9
9. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP(n, alpha, beta, cos(a*Theta))
Potential Problems:
• Differences in syntax
• Function is not implemented in one system,
• Function has multiple representations in one system,
• Differences in definitions.
3/9
10. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP($2, $0, $1, $3)
Potential Problems:
• Differences in syntax ← solved by translation patterns
• Function is not implemented in one system,
• Function has multiple representations in one system,
• Differences in definitions.
3/9
11. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP(n, alpha, beta, cos(a*Theta))
Potential Problems:
• Differences in syntax ← solved by translation patterns
• Function is not implemented in one system,
• Function has multiple representations in one system,
• Differences in definitions.
3/9
12. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
DLMF 18.5.7
Potential Problems:
• Differences in syntax ← solved by translation patterns
• Function is not implemented in one system,
translate equivalent presentations
• Function has multiple representations in one system,
• Differences in definitions.
n
=0
(n + α + β + 1) (α + + 1)n−
! (n − )!
x − 1
2
3/9
13. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP or Jacobi or JacobiPoly
Potential Problems:
• Differences in syntax ← solved by translation patterns
• Function is not implemented in one system,
translate equivalent presentations
• Function has multiple representations in one system,
• Differences in definitions.
3/9
14. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP or Jacobi or JacobiPoly
Potential Problems:
• Differences in syntax ← solved by translation patterns
• Function is not implemented in one system,
translate equivalent presentations
• Function has multiple representations in one system,
just pick a valid translation
• Differences in definitions.
3/9
15. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP(n, alpha, beta, cos(a*Theta))
Potential Problems:
• Differences in syntax ← solved by translation patterns
• Function is not implemented in one system,
translate equivalent presentations
• Function has multiple representations in one system,
just pick a valid translation
• Differences in definitions.
3/9
16. Problems of Translations DLMF 18.3
Semantic LATEX:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
CAS Maple:
JacobiP(n, alpha, beta, cos(a*Theta))
Potential Problems:
• Differences in syntax ← solved by translation patterns
• Function is not implemented in one system,
translate equivalent presentations
• Function has multiple representations in one system,
just pick a valid translation
• Differences in definitions. ← wait... What?
3/9
17. Problems of Translations DLMF 4.23.9 Maple Inv. Trig. Functions
Rendered Version Semantic LATEX CAS Maple
arccot(z) acot@{z} arccot(z)
4/9
18. Problems of Translations DLMF 4.23.9 Maple Inv. Trig. Functions
Rendered Version Semantic LATEX CAS Maple
arccot(z) acot@{z} arccot(z)
Maple
Figure 1: (arccot(z)) with
branch cut at [−∞i, −i], [i, ∞i].
DLMF & Mathematica
Figure 2: (arccot(z)) with
branch cut at [−i, i].
4/9
19. Problems of Translations DLMF 4.23.9 Maple Inv. Trig. Functions
Rendered Version Semantic LATEX CAS Maple
arccot(z) acot@{z} arccot(z)
Maple
Figure 1: (arccot(z)) with
branch cut at [−∞i, −i], [i, ∞i].
DLMF & Mathematica
Figure 2: (arccot(z)) with
branch cut at [−i, i].
4/9
20. Problems of Translations DLMF 4.23.9 Maple Inv. Trig. Functions
Rendered Version Semantic LATEX CAS Maple
arccot(z) acot@{z} arctan(1/z)
Maple
Figure 1: (arccot(z)) with
branch cut at [−∞i, −i], [i, ∞i].
DLMF & Mathematica
Figure 2: (arccot(z)) with
branch cut at [−i, i].
4/9
22. Problems of Generic LATEX DLMF 18.3
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Semantics:
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
Potential Problems:
• Is P a function, variable, constant?
• Is cos(aΘ) an argument of P or part of a multiplication?
• What are α, β, n, a, and Θ?
5/9
23. Problems of Generic LATEX DLMF 18.3
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Semantics:
Jacobi polynomial or Legendre function or Ferrers function or ...
Potential Problems:
• Is P a function, variable, constant?
• Is cos(aΘ) an argument of P or part of a multiplication?
• What are α, β, n, a, and Θ?
5/9
24. Problems of Generic LATEX DLMF 18.3
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Semantics:
P(cos(aΘ)) vs P · (cos(aΘ))
Potential Problems:
• Is P a function, variable, constant?
• Is cos(aΘ) an argument of P or part of a multiplication?
• What are α, β, n, a, and Θ?
5/9
25. Problems of Generic LATEX DLMF 18.3
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Semantics:
Variable or 2nd Feigenbaum constant or ...
Potential Problems:
• Is P a function, variable, constant?
• Is cos(aΘ) an argument of P or part of a multiplication?
• What are α, β, n, a, and Θ?
5/9
27. Multiple-Scan Approach
Rendered LATEX:
P
(α,β)
n (cos(aΘ))
The Naive Approach
How does a reader understands the mathematical formula?
• he knows the symbols and structure,
knowledge-based pattern recognition
• it was previously introduced in the paper (e.g. in definitions,
the text or in other referenced publications),
analyse the context from near to far
• he searching the formula in books or online
dictionary-based pattern recognition
6/9
28. Multiple-Scan Approach
Rendered LATEX:
P
(α,β)
n (cos(aΘ))
The Naive Approach
How does a reader understands the mathematical formula?
• he knows the symbols and structure,
knowledge-based pattern recognition
• it was previously introduced in the paper (e.g. in definitions,
the text or in other referenced publications),
analyse the context from near to far
• he searching the formula in books or online
dictionary-based pattern recognition
6/9
29. Multiple-Scan Approach
Rendered LATEX:
P
(α,β)
n (cos(aΘ))
The Naive Approach
How does a reader understands the mathematical formula?
• he knows the symbols and structure,
knowledge-based pattern recognition
• it was previously introduced in the paper (e.g. in definitions,
the text or in other referenced publications),
analyse the context from near to far
• he searching the formula in books or online
dictionary-based pattern recognition
6/9
30. Multiple-Scan Approach
Rendered LATEX:
P
(α,β)
n (cos(aΘ))
The Naive Approach
How does a reader understands the mathematical formula?
• he knows the symbols and structure,
knowledge-based pattern recognition
• it was previously introduced in the paper (e.g. in definitions,
the text or in other referenced publications),
analyse the context from near to far
• he searching the formula in books or online
dictionary-based pattern recognition
6/9
31. Multiple-Scan Approach
Rendered LATEX:
P
(α,β)
n (cos(aΘ))
The Naive Approach
How does a reader understands the mathematical formula?
• he knows the symbols and structure,
knowledge-based pattern recognition
• it was previously introduced in the paper (e.g. in definitions,
the text or in other referenced publications),
analyse the context from near to far
• he searching the formula in books or online
dictionary-based pattern recognition
6/9
32. Multiple-Scan Approach
Rendered LATEX:
P
(α,β)
n (cos(aΘ))
The Naive Approach
How does a reader understands the mathematical formula?
• he knows the symbols and structure,
knowledge-based pattern recognition
• it was previously introduced in the paper (e.g. in definitions,
the text or in other referenced publications),
analyse the context from near to far
• he searching the formula in books or online
dictionary-based pattern recognition
6/9
33. Multiple-Scan Approach
Rendered LATEX:
P
(α,β)
n (cos(aΘ))
The Naive Approach
How does a reader understands the mathematical formula?
• he knows the symbols and structure,
knowledge-based pattern recognition
• it was previously introduced in the paper (e.g. in definitions,
the text or in other referenced publications),
analyse the context from near to far
• he searching the formula in books or online
dictionary-based pattern recognition
6/9
34. Multiple-Scan Approach
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Adopt Human Behavior
Let’s try to adopt the previous steps
• pattern recognition
narrow down possible meanings from the structure of the
expression
• context analysis
Near-Field-Analysis (NFA), e.g., extract identifier-definien
pairs from text, analyze definition environments, ...
Far-Field-Analysis (FFA), e.g., overall topic of the paper,
citations, author’s field of interest, ...
7/9
35. Multiple-Scan Approach
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Adopt Human Behavior
Let’s try to adopt the previous steps
• pattern recognition
narrow down possible meanings from the structure of the
expression
• context analysis
Near-Field-Analysis (NFA), e.g., extract identifier-definien
pairs from text, analyze definition environments, ...
Far-Field-Analysis (FFA), e.g., overall topic of the paper,
citations, author’s field of interest, ...
7/9
36. Multiple-Scan Approach
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Adopt Human Behavior
Let’s try to adopt the previous steps
• pattern recognition
narrow down possible meanings from the structure of the
expression
• context analysis
Near-Field-Analysis (NFA), e.g., extract identifier-definien
pairs from text, analyze definition environments, ...
Far-Field-Analysis (FFA), e.g., overall topic of the paper,
citations, author’s field of interest, ...
7/9
37. Multiple-Scan Approach
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Adopt Human Behavior
Let’s try to adopt the previous steps
• pattern recognition
narrow down possible meanings from the structure of the
expression
• context analysis
Near-Field-Analysis (NFA), e.g., extract identifier-definien
pairs from text, analyze definition environments, ...
Far-Field-Analysis (FFA), e.g., overall topic of the paper,
citations, author’s field of interest, ...
7/9
38. Multiple-Scan Approach
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Adopt Human Behavior
Let’s try to adopt the previous steps
• pattern recognition
narrow down possible meanings from the structure of the
expression
• context analysis
Near-Field-Analysis (NFA), e.g., extract identifier-definien
pairs from text, analyze definition environments, ...
Far-Field-Analysis (FFA), e.g., overall topic of the paper,
citations, author’s field of interest, ...
7/9
39. Multiple-Scan Approach
Generic LATEX:
P_n^{(alpha,beta)}(cos(aTheta))
Adopt Human Behavior
Let’s try to adopt the previous steps
• pattern recognition
narrow down possible meanings from the structure of the
expression
• context analysis
Near-Field-Analysis (NFA), e.g., extract identifier-definien
pairs from text, analyze definition environments, ...
Far-Field-Analysis (FFA), e.g., overall topic of the paper,
citations, author’s field of interest, ...
7/9
40. Multiple-Scan Approach
Expression Analysis
• 1 subscript
• 2 supscripts in parentheses
• 1 variable
• The variable is a subexpression
Expression Analysis
• 1 subscript
• 2 supscripts in parentheses
• 1 variable
• The variable is a subexpression
CONCLUSION
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
Semantic LaTeX
JacobiP{alpha}{beta}{n}@{cos@{aTheta}}
Semantic LaTeX
MLPMLP
MLP Syntax TreeMLP Syntax TreeMLP Syntax Tree
P_n^{(alpha, beta)}(cos(aTheta))
Generic LaTeX
P_n^{(alpha, beta)}(cos(aTheta))
Generic LaTeX
Near-Field-Analysis
Multiple scans of
expression and its
environment
Far-Field-Analysis
8/9
41. Wikipedia Recommender System
A real-time recommender system for semantic
version of mathematical input included in the
editor of Wikipedia articles.
• real-time recommendations
• ordered from most likely to impossible
• consider the context
9/9