SlideShare a Scribd company logo
1 of 52
Download to read offline
Improving the Representation and Conversion
of Mathematical Formulae by Considering
their Textual Context
Moritz Schubotz1
, André Greiner-Petter*1
, Philipp Scharpf*1
,
Norman Meuschke1
, Howard S. Cohl2
, Bela Gipp1
June 5, 2018
1University of Konstanz, Germany
2National Institute of Standards and Technology, USA
*sponsored by SIGIR Student Travel Grant 1/14
Motivation
Formats of Mathematical Formulae
Riemann Zeta Function
Rendered Version:
ζ(s) = 0 ⇒ s = 1
2 ∨ s = 0
2/14
Formats of Mathematical Formulae
Riemann Zeta Function
Rendered Version:
ζ(s) = 0 ⇒ s = 1
2 ∨ s = 0
LATEX:
zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0
2/14
Formats of Mathematical Formulae
Riemann Zeta Function
Rendered Version:
ζ(s) = 0 ⇒ s = 1
2 ∨ s = 0
LATEX:
zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0
Mathematica:
Implies[
Equal[Zeta[s], 0],
Or[
Equal[Re[s], Rational[1, 2]],
Equal[Im[s], 0]
]
]
2/14
Formats of Mathematical Formulae
Riemann Zeta Function
Rendered Version:
ζ(s) = 0 ⇒ s = 1
2 ∨ s = 0
LATEX:
zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0
Mathematica:
Implies[
Equal[Zeta[s], 0],
Or[
Equal[Re[s], Rational[1, 2]],
Equal[Im[s], 0]
]
]
2/14
Formats of Mathematical Formulae
Riemann Zeta Function
Rendered Version:
ζ(s) = 0 ⇒ s = 1
2 ∨ s = 0
LATEX:
zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0
Mathematica:
Implies[
Equal[Zeta[s], 0],
Or[
Equal[Re[s], Rational[1, 2]],
Equal[Im[s], 0]
]
]
← 18 tokens with max depth of 2
← 16 tokens with max depth of 5
2/14
Formats of Mathematical Formulae - MathML
A Combined Format
Find another format that (1) provides presentation and semantic
information, (2) is easy to parse, and (3) is extendible.
⇒ Mathematical Markup Language 3.0.
3/14
Formats of Mathematical Formulae - MathML
A Combined Format
Find another format that (1) provides presentation and semantic
information, (2) is easy to parse, and (3) is extendible.
⇒ Mathematical Markup Language 3.0.
3/14
Formats of Mathematical Formulae - MathML
A Combined Format
Find another format that (1) provides presentation and semantic
information, (2) is easy to parse, and (3) is extendible.
⇒ Mathematical Markup Language 3.0.
3/14
Formats of Mathematical Formulae - MathML
A Combined Format
Find another format that (1) provides presentation and semantic
information, (2) is easy to parse, and (3) is extendible.
⇒ Mathematical Markup Language 3.0.
3/14
Formats of Mathematical Formulae - MathML
A Combined Format
Find another format that (1) provides presentation and semantic
information, (2) is easy to parse, and (3) is extendible.
⇒ Mathematical Markup Language 3.0.
3/14
Formats of Mathematical Formulae - MathML
A Combined Format
Find another format that (1) provides presentation and semantic
information, (2) is easy to parse, and (3) is extendible.
⇒ Mathematical Markup Language 3.0.
Part of the MathML for ζ(s) = 0 ⇒ s = 1
2 ∨ s = 0:
<math><semantics><mrow>. . .
<mo id="5" xref="20">=</mo>
<mn id="5" xref="21">0</mn>
<mo id="7" xref="19">⇒</mo>. . .</mrow>
<annotation−xml encoding="MathML−Content">
<apply><implies id="19" xref="7"/>
<apply><eq id="20" xref="5"/>
<apply><csymbol id="21" xref="1">ζ</csymbol>. . .
</annotation−xml></semantics></math>
3/14
Contributions
Our Contributions
We present the following three main contributions
1. MathMLBen - benchmark for MathML,
2. Evaluate state-of-the art translation tools,
3. Propose a new approach that consider textual context.
4/14
Contributions
Our Contributions
We present the following three main contributions
1. MathMLBen - benchmark for MathML,
2. Evaluate state-of-the art translation tools,
3. Propose a new approach that consider textual context.
4/14
Contributions
Our Contributions
We present the following three main contributions
1. MathMLBen - benchmark for MathML,
2. Evaluate state-of-the art translation tools,
3. Propose a new approach that consider textual context.
4/14
Contributions
Our Contributions
We present the following three main contributions
1. MathMLBen - benchmark for MathML,
2. Evaluate state-of-the art translation tools,
3. Propose a new approach that consider textual context.
4/14
MathMLBen - Create a MathML
Benchmark Dataset
Create MML Benchmark Dataset mathmlben.wmflabs.org
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
5/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
5/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
5/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
5/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
5/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
5/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
Annotated MathML
Content MathML do not provide enough semantic information.
We overcome this issue by manually annotate content MathML
with Wikidata IDs. We used special TEX-macros to
1) annotate identifier with Wikidata IDs, and
2) manipulate the expression tree.
6/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
Annotated MathML
Content MathML do not provide enough semantic information.
We overcome this issue by manually annotate content MathML
with Wikidata IDs. We used special TEX-macros to
1) annotate identifier with Wikidata IDs, and
2) manipulate the expression tree.
6/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
Annotated MathML
Content MathML do not provide enough semantic information.
We overcome this issue by manually annotate content MathML
with Wikidata IDs. We used special TEX-macros to
1) annotate identifier with Wikidata IDs, and
2) manipulate the expression tree.
6/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
Annotated MathML
Content MathML do not provide enough semantic information.
We overcome this issue by manually annotate content MathML
with Wikidata IDs. We used special TEX-macros to
1) annotate identifier with Wikidata IDs, and
2) manipulate the expression tree.
Original LATEX: W(2, k)
6/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
Annotated MathML
Content MathML do not provide enough semantic information.
We overcome this issue by manually annotate content MathML
with Wikidata IDs. We used special TEX-macros to
1) annotate identifier with Wikidata IDs, and
2) manipulate the expression tree.
Original LATEX: W(2, k)
LaTeXML Input: wf{Q7913892}{W}(2, w{Q12503}{k})
6/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
Annotated MathML
Content MathML do not provide enough semantic information.
We overcome this issue by manually annotate content MathML
with Wikidata IDs. We used special TEX-macros to
1) annotate identifier with Wikidata IDs, and
2) manipulate the expression tree.
Original LATEX: W(2, k)
LaTeXML Input: wf{Q7913892}{W}(2, w{Q12503}{k})
MathML Output:
<apply id="p1.1.m1.1.13.1.1.cmml" xref="p1.1.m1.1.13.1.2">
<csymbol cd="wikidata" id=". . ." xref=". . .">Q7913892</csymbol>
<cn type="integer" id=". . ." xref=". . .">2</cn>
<csymbol cd="wikidata" id=". . ." xref=". . .">Q12503</csymbol>
</apply>
6/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
MathMLBen Collection
We annotated in total 305 formulae.
• 1 to 100: randomly sampled from Wikipedia. Used for
’National Institute of Informatics Testbeds and Community for
Information access Research Project’ (NTCIR) 11
• 101 to 200: randomly sampled from the sources of NIST
Digital Library of Mathematical Functions (contains 9,897
labeled formulae).
• 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12
Wikipedia datasets.
All data is available at https://mathmlben.wmflabs.org/.
7/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
MathMLBen Collection
We annotated in total 305 formulae.
• 1 to 100: randomly sampled from Wikipedia. Used for
’National Institute of Informatics Testbeds and Community for
Information access Research Project’ (NTCIR) 11
• 101 to 200: randomly sampled from the sources of NIST
Digital Library of Mathematical Functions (contains 9,897
labeled formulae).
• 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12
Wikipedia datasets.
All data is available at https://mathmlben.wmflabs.org/.
7/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
MathMLBen Collection
We annotated in total 305 formulae.
• 1 to 100: randomly sampled from Wikipedia. Used for
’National Institute of Informatics Testbeds and Community for
Information access Research Project’ (NTCIR) 11
• 101 to 200: randomly sampled from the sources of NIST
Digital Library of Mathematical Functions (contains 9,897
labeled formulae).
• 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12
Wikipedia datasets.
All data is available at https://mathmlben.wmflabs.org/.
7/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
MathMLBen Collection
We annotated in total 305 formulae.
• 1 to 100: randomly sampled from Wikipedia. Used for
’National Institute of Informatics Testbeds and Community for
Information access Research Project’ (NTCIR) 11
• 101 to 200: randomly sampled from the sources of NIST
Digital Library of Mathematical Functions (contains 9,897
labeled formulae).
• 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12
Wikipedia datasets.
All data is available at https://mathmlben.wmflabs.org/.
7/14
Create MML Benchmark Dataset mathmlben.wmflabs.org
8/14
Evaluate State-Of-The-Art LATEX to
MathML Conversion Tools
Benchmarking Conversion Tools
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
9/14
Benchmarking Conversion Tools
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
9/14
Benchmarking Conversion Tools
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
9/14
Benchmarking Conversion Tools
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
9/14
Benchmarking Conversion Tools - Accuracy
Tested Conversion Tools
1. LaTeXML: Perl tool used to create DLMF
2. LaTeX2MathML: small Python project
3. Mathoid: service that allows to generate also SVG and PNG
4. SnuggleTeX: Java library developed at University of Edinburgh
5. MathToWeb: Java web application
6. TeXZilla: Javascript web application
7. Mathematical: Ruby application that can generate SVG/PNG
8. CAS: Computer Algebra System that is capable to parse LATEX
9. Part-Of-Math (POM) Tagger: grammar-based LATEX parser
that was used to perform translations from LATEX to CAS.
10/14
Benchmarking Conversion Tools - Accuracy
Tested Conversion Tools
1. LaTeXML: Perl tool used to create DLMF
2. LaTeX2MathML: small Python project
3. Mathoid: service that allows to generate also SVG and PNG
4. SnuggleTeX: Java library developed at University of Edinburgh
5. MathToWeb: Java web application
6. TeXZilla: Javascript web application
7. Mathematical: Ruby application that can generate SVG/PNG
8. CAS: Computer Algebra System that is capable to parse LATEX
9. Part-Of-Math (POM) Tagger: grammar-based LATEX parser
that was used to perform translations from LATEX to CAS.
10/14
Benchmarking Conversion Tools - Accuracy
305 305 295 305
288
229
290
305 305
0
50
100
150
200
250
300
0
10
20
30
40
50
60
70
80
SuccessfullyParsedExpressions
TreeEditDistance
Average Distance of Presentation Subtree
Average Distance of Content Subtree
Successfully Parsed LaTeX Expressions
Average of Structural Distances & Successfully
Parsed Expressions
11/14
Benchmarking Conversion Tools - Runtime
372,76
29,65
20,77
9,57
4,17
3,69
1,79
1,41
1,00 10,00 100,00 1000,00
LatexML
Mathoid
Mathema�cal
Latex2MML
MathToWeb
POM
SnuggleTeX
TeXZilla
Performance of Tools
Dura�on in Seconds
12/14
Improvements for Conversion Tools
by Considering Textual Context
Approach to Improve Conversion Tools
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
13/14
Approach to Improve Conversion Tools
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
13/14
Approach to Improve Conversion Tools
SwitchSwitch
POM-Tagger
Dictionaries
β
β
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
13/14
Approach to Improve Conversion Tools
POM-Tagger
Dictionaries
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~~~~
~~~~~~~
~~~~~~~~
Documents
Formula
a+b
Semantic
Formula
a+b
Manual
Refinements
Gold Standard
MML
Multiple
Cycles
Random
Selection
Annotated MMLAnnotated MML
LaTeXML
MML ComparisonMML Comparison
VS
® Vecteezy.com
&
Converter
β β
Mathematical
Language Processor
Identifier &
Definiens
Tree
Refinemets
MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca
β
π ζ
XML
SwitchSwitch
β
β
13/14
Approach to Improve Conversion Tools
Example for item 101 mathmlben.wmflabs.org/101
A function f(x, y) is continuous at a point (a, b) if
lim
(x,y)→(a,b)
f(x, y) = f(a, b), (1)
that is, for every arbitrarily small positive constant there exists
δ(> 0) such that
|f(a + α, b + β) − f(a, b)| < , (2)
for all α and β that satisfy |α|, |β| < δ.
14/14
Approach to Improve Conversion Tools
Example for item 101 mathmlben.wmflabs.org/101
A function f(x, y) is continuous at a point (a, b) if
lim
(x,y)→(a,b)
f(x, y) = f(a, b), (1)
that is, for every arbitrarily small positive constant there exists
δ(> 0) such that
|f(a + α, b + β) − f(a, b)| < , (2)
for all α and β that satisfy |α|, |β| < δ.
14/14
Thanks for you attention!

More Related Content

Recently uploaded

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Content

  • 1. Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context Moritz Schubotz1 , André Greiner-Petter*1 , Philipp Scharpf*1 , Norman Meuschke1 , Howard S. Cohl2 , Bela Gipp1 June 5, 2018 1University of Konstanz, Germany 2National Institute of Standards and Technology, USA *sponsored by SIGIR Student Travel Grant 1/14
  • 3. Formats of Mathematical Formulae Riemann Zeta Function Rendered Version: ζ(s) = 0 ⇒ s = 1 2 ∨ s = 0 2/14
  • 4. Formats of Mathematical Formulae Riemann Zeta Function Rendered Version: ζ(s) = 0 ⇒ s = 1 2 ∨ s = 0 LATEX: zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0 2/14
  • 5. Formats of Mathematical Formulae Riemann Zeta Function Rendered Version: ζ(s) = 0 ⇒ s = 1 2 ∨ s = 0 LATEX: zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0 Mathematica: Implies[ Equal[Zeta[s], 0], Or[ Equal[Re[s], Rational[1, 2]], Equal[Im[s], 0] ] ] 2/14
  • 6. Formats of Mathematical Formulae Riemann Zeta Function Rendered Version: ζ(s) = 0 ⇒ s = 1 2 ∨ s = 0 LATEX: zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0 Mathematica: Implies[ Equal[Zeta[s], 0], Or[ Equal[Re[s], Rational[1, 2]], Equal[Im[s], 0] ] ] 2/14
  • 7. Formats of Mathematical Formulae Riemann Zeta Function Rendered Version: ζ(s) = 0 ⇒ s = 1 2 ∨ s = 0 LATEX: zeta(s) = 0 Rightarrow Re s = frac12 lor Im s=0 Mathematica: Implies[ Equal[Zeta[s], 0], Or[ Equal[Re[s], Rational[1, 2]], Equal[Im[s], 0] ] ] ← 18 tokens with max depth of 2 ← 16 tokens with max depth of 5 2/14
  • 8. Formats of Mathematical Formulae - MathML A Combined Format Find another format that (1) provides presentation and semantic information, (2) is easy to parse, and (3) is extendible. ⇒ Mathematical Markup Language 3.0. 3/14
  • 9. Formats of Mathematical Formulae - MathML A Combined Format Find another format that (1) provides presentation and semantic information, (2) is easy to parse, and (3) is extendible. ⇒ Mathematical Markup Language 3.0. 3/14
  • 10. Formats of Mathematical Formulae - MathML A Combined Format Find another format that (1) provides presentation and semantic information, (2) is easy to parse, and (3) is extendible. ⇒ Mathematical Markup Language 3.0. 3/14
  • 11. Formats of Mathematical Formulae - MathML A Combined Format Find another format that (1) provides presentation and semantic information, (2) is easy to parse, and (3) is extendible. ⇒ Mathematical Markup Language 3.0. 3/14
  • 12. Formats of Mathematical Formulae - MathML A Combined Format Find another format that (1) provides presentation and semantic information, (2) is easy to parse, and (3) is extendible. ⇒ Mathematical Markup Language 3.0. 3/14
  • 13. Formats of Mathematical Formulae - MathML A Combined Format Find another format that (1) provides presentation and semantic information, (2) is easy to parse, and (3) is extendible. ⇒ Mathematical Markup Language 3.0. Part of the MathML for ζ(s) = 0 ⇒ s = 1 2 ∨ s = 0: <math><semantics><mrow>. . . <mo id="5" xref="20">=</mo> <mn id="5" xref="21">0</mn> <mo id="7" xref="19">⇒</mo>. . .</mrow> <annotation−xml encoding="MathML−Content"> <apply><implies id="19" xref="7"/> <apply><eq id="20" xref="5"/> <apply><csymbol id="21" xref="1">ζ</csymbol>. . . </annotation−xml></semantics></math> 3/14
  • 14. Contributions Our Contributions We present the following three main contributions 1. MathMLBen - benchmark for MathML, 2. Evaluate state-of-the art translation tools, 3. Propose a new approach that consider textual context. 4/14
  • 15. Contributions Our Contributions We present the following three main contributions 1. MathMLBen - benchmark for MathML, 2. Evaluate state-of-the art translation tools, 3. Propose a new approach that consider textual context. 4/14
  • 16. Contributions Our Contributions We present the following three main contributions 1. MathMLBen - benchmark for MathML, 2. Evaluate state-of-the art translation tools, 3. Propose a new approach that consider textual context. 4/14
  • 17. Contributions Our Contributions We present the following three main contributions 1. MathMLBen - benchmark for MathML, 2. Evaluate state-of-the art translation tools, 3. Propose a new approach that consider textual context. 4/14
  • 18. MathMLBen - Create a MathML Benchmark Dataset
  • 19. Create MML Benchmark Dataset mathmlben.wmflabs.org SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 5/14
  • 20. Create MML Benchmark Dataset mathmlben.wmflabs.org SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 5/14
  • 21. Create MML Benchmark Dataset mathmlben.wmflabs.org SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 5/14
  • 22. Create MML Benchmark Dataset mathmlben.wmflabs.org SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 5/14
  • 23. Create MML Benchmark Dataset mathmlben.wmflabs.org SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 5/14
  • 24. Create MML Benchmark Dataset mathmlben.wmflabs.org SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 5/14
  • 25. Create MML Benchmark Dataset mathmlben.wmflabs.org Annotated MathML Content MathML do not provide enough semantic information. We overcome this issue by manually annotate content MathML with Wikidata IDs. We used special TEX-macros to 1) annotate identifier with Wikidata IDs, and 2) manipulate the expression tree. 6/14
  • 26. Create MML Benchmark Dataset mathmlben.wmflabs.org Annotated MathML Content MathML do not provide enough semantic information. We overcome this issue by manually annotate content MathML with Wikidata IDs. We used special TEX-macros to 1) annotate identifier with Wikidata IDs, and 2) manipulate the expression tree. 6/14
  • 27. Create MML Benchmark Dataset mathmlben.wmflabs.org Annotated MathML Content MathML do not provide enough semantic information. We overcome this issue by manually annotate content MathML with Wikidata IDs. We used special TEX-macros to 1) annotate identifier with Wikidata IDs, and 2) manipulate the expression tree. 6/14
  • 28. Create MML Benchmark Dataset mathmlben.wmflabs.org Annotated MathML Content MathML do not provide enough semantic information. We overcome this issue by manually annotate content MathML with Wikidata IDs. We used special TEX-macros to 1) annotate identifier with Wikidata IDs, and 2) manipulate the expression tree. Original LATEX: W(2, k) 6/14
  • 29. Create MML Benchmark Dataset mathmlben.wmflabs.org Annotated MathML Content MathML do not provide enough semantic information. We overcome this issue by manually annotate content MathML with Wikidata IDs. We used special TEX-macros to 1) annotate identifier with Wikidata IDs, and 2) manipulate the expression tree. Original LATEX: W(2, k) LaTeXML Input: wf{Q7913892}{W}(2, w{Q12503}{k}) 6/14
  • 30. Create MML Benchmark Dataset mathmlben.wmflabs.org Annotated MathML Content MathML do not provide enough semantic information. We overcome this issue by manually annotate content MathML with Wikidata IDs. We used special TEX-macros to 1) annotate identifier with Wikidata IDs, and 2) manipulate the expression tree. Original LATEX: W(2, k) LaTeXML Input: wf{Q7913892}{W}(2, w{Q12503}{k}) MathML Output: <apply id="p1.1.m1.1.13.1.1.cmml" xref="p1.1.m1.1.13.1.2"> <csymbol cd="wikidata" id=". . ." xref=". . .">Q7913892</csymbol> <cn type="integer" id=". . ." xref=". . .">2</cn> <csymbol cd="wikidata" id=". . ." xref=". . .">Q12503</csymbol> </apply> 6/14
  • 31. Create MML Benchmark Dataset mathmlben.wmflabs.org MathMLBen Collection We annotated in total 305 formulae. • 1 to 100: randomly sampled from Wikipedia. Used for ’National Institute of Informatics Testbeds and Community for Information access Research Project’ (NTCIR) 11 • 101 to 200: randomly sampled from the sources of NIST Digital Library of Mathematical Functions (contains 9,897 labeled formulae). • 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12 Wikipedia datasets. All data is available at https://mathmlben.wmflabs.org/. 7/14
  • 32. Create MML Benchmark Dataset mathmlben.wmflabs.org MathMLBen Collection We annotated in total 305 formulae. • 1 to 100: randomly sampled from Wikipedia. Used for ’National Institute of Informatics Testbeds and Community for Information access Research Project’ (NTCIR) 11 • 101 to 200: randomly sampled from the sources of NIST Digital Library of Mathematical Functions (contains 9,897 labeled formulae). • 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12 Wikipedia datasets. All data is available at https://mathmlben.wmflabs.org/. 7/14
  • 33. Create MML Benchmark Dataset mathmlben.wmflabs.org MathMLBen Collection We annotated in total 305 formulae. • 1 to 100: randomly sampled from Wikipedia. Used for ’National Institute of Informatics Testbeds and Community for Information access Research Project’ (NTCIR) 11 • 101 to 200: randomly sampled from the sources of NIST Digital Library of Mathematical Functions (contains 9,897 labeled formulae). • 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12 Wikipedia datasets. All data is available at https://mathmlben.wmflabs.org/. 7/14
  • 34. Create MML Benchmark Dataset mathmlben.wmflabs.org MathMLBen Collection We annotated in total 305 formulae. • 1 to 100: randomly sampled from Wikipedia. Used for ’National Institute of Informatics Testbeds and Community for Information access Research Project’ (NTCIR) 11 • 101 to 200: randomly sampled from the sources of NIST Digital Library of Mathematical Functions (contains 9,897 labeled formulae). • 201 to 305: 70% from NTCIR arXiv and 30% from NTCIR-12 Wikipedia datasets. All data is available at https://mathmlben.wmflabs.org/. 7/14
  • 35. Create MML Benchmark Dataset mathmlben.wmflabs.org 8/14
  • 36. Evaluate State-Of-The-Art LATEX to MathML Conversion Tools
  • 37. Benchmarking Conversion Tools SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 9/14
  • 38. Benchmarking Conversion Tools SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 9/14
  • 39. Benchmarking Conversion Tools SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 9/14
  • 40. Benchmarking Conversion Tools SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 9/14
  • 41. Benchmarking Conversion Tools - Accuracy Tested Conversion Tools 1. LaTeXML: Perl tool used to create DLMF 2. LaTeX2MathML: small Python project 3. Mathoid: service that allows to generate also SVG and PNG 4. SnuggleTeX: Java library developed at University of Edinburgh 5. MathToWeb: Java web application 6. TeXZilla: Javascript web application 7. Mathematical: Ruby application that can generate SVG/PNG 8. CAS: Computer Algebra System that is capable to parse LATEX 9. Part-Of-Math (POM) Tagger: grammar-based LATEX parser that was used to perform translations from LATEX to CAS. 10/14
  • 42. Benchmarking Conversion Tools - Accuracy Tested Conversion Tools 1. LaTeXML: Perl tool used to create DLMF 2. LaTeX2MathML: small Python project 3. Mathoid: service that allows to generate also SVG and PNG 4. SnuggleTeX: Java library developed at University of Edinburgh 5. MathToWeb: Java web application 6. TeXZilla: Javascript web application 7. Mathematical: Ruby application that can generate SVG/PNG 8. CAS: Computer Algebra System that is capable to parse LATEX 9. Part-Of-Math (POM) Tagger: grammar-based LATEX parser that was used to perform translations from LATEX to CAS. 10/14
  • 43. Benchmarking Conversion Tools - Accuracy 305 305 295 305 288 229 290 305 305 0 50 100 150 200 250 300 0 10 20 30 40 50 60 70 80 SuccessfullyParsedExpressions TreeEditDistance Average Distance of Presentation Subtree Average Distance of Content Subtree Successfully Parsed LaTeX Expressions Average of Structural Distances & Successfully Parsed Expressions 11/14
  • 44. Benchmarking Conversion Tools - Runtime 372,76 29,65 20,77 9,57 4,17 3,69 1,79 1,41 1,00 10,00 100,00 1000,00 LatexML Mathoid Mathema�cal Latex2MML MathToWeb POM SnuggleTeX TeXZilla Performance of Tools Dura�on in Seconds 12/14
  • 45. Improvements for Conversion Tools by Considering Textual Context
  • 46. Approach to Improve Conversion Tools SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 13/14
  • 47. Approach to Improve Conversion Tools SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 13/14
  • 48. Approach to Improve Conversion Tools SwitchSwitch POM-Tagger Dictionaries β β ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML 13/14
  • 49. Approach to Improve Conversion Tools POM-Tagger Dictionaries ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~~~~~ Documents Formula a+b Semantic Formula a+b Manual Refinements Gold Standard MML Multiple Cycles Random Selection Annotated MMLAnnotated MML LaTeXML MML ComparisonMML Comparison VS ® Vecteezy.com & Converter β β Mathematical Language Processor Identifier & Definiens Tree Refinemets MathMLben#422f2d908725a379336f2c6083c5b6edf69157ca β π ζ XML SwitchSwitch β β 13/14
  • 50. Approach to Improve Conversion Tools Example for item 101 mathmlben.wmflabs.org/101 A function f(x, y) is continuous at a point (a, b) if lim (x,y)→(a,b) f(x, y) = f(a, b), (1) that is, for every arbitrarily small positive constant there exists δ(> 0) such that |f(a + α, b + β) − f(a, b)| < , (2) for all α and β that satisfy |α|, |β| < δ. 14/14
  • 51. Approach to Improve Conversion Tools Example for item 101 mathmlben.wmflabs.org/101 A function f(x, y) is continuous at a point (a, b) if lim (x,y)→(a,b) f(x, y) = f(a, b), (1) that is, for every arbitrarily small positive constant there exists δ(> 0) such that |f(a + α, b + β) − f(a, b)| < , (2) for all α and β that satisfy |α|, |β| < δ. 14/14
  • 52. Thanks for you attention!