Your SlideShare is downloading. ×
0
Bringing Math to LOD:
A Semantic Publishing Platform Prototype for
Scientific Collections in Mathematics
Olga Nevzorova, Ni...
Outline

1 Introduction
2 Approach
3 Use Cases

2 / 29
Our Contribution
Our prototype is geared to build a semantic graph of
mathematical knowledge objects, that
is extracted fr...
Research Output
IVM Data Set

LOD representation of 1 330 scholarly publications of
the «Izvestiya Vuzov. Matematika» (IVM...
Related Work

Domain-specific languages: OMDoc, MathLang
Domain models: Cambridge Mathematical
Thesaurus, DBpedia (math-rel...
Outline

1 Introduction
2 Approach
3 Use Cases

6 / 29
Key Research Contributions
a thorough ontological model of the mathematical
domain
an ontology-based language-independent ...
Prototype’s Design

8 / 29
Domain Model

9 / 29
Ontology of Structural Elements (1)
http://cll.niimm.ksu.ru/ontologies/mocassin

Covers 15 common structural elements:

De...
Ontology of Structural Elements (2)
http://cll.niimm.ksu.ru/ontologies/mocassin

3 cardinality axioms, e.g.
Proof ∧ (= 1 p...
Ontology of Mathematical Concepts (1)
http://cll.niimm.ksu.ru/ontologies/mathematics

Covers 3 450 mathematical concepts
D...
Ontology of Mathematical Concepts (2)
http://cll.niimm.ksu.ru/ontologies/mathematics

Includes two taxonomies:
taxonomy of...
Ontology of Mathematical Concepts (3)
Object properties

belongsTo/contains, e.g.
Barycentric Coordinates belongsTo Metric...
Ontology of Mathematical Concepts (4)
Stats

3 450 classes
27% of classes are mapped onto DBpedia
3 630 subclass-of proper...
Semantic Annotation

16 / 29
NLP Annotation
Relies on the OntoIntegrator facilities
Solves some of the conventional linguistic tasks, such
as:
tokeniza...
Mining the Logical Structure
Supports our ontology of structural elements:
elements in real texts are instances of the ont...
Mathematical Named Entity Extraction

Supports our ontology of mathematical concepts:
assigned NPs are instances of the on...
Connecting Named Entities to Formulas

20 / 29
Connecting Named Entities to Formulas
Parsing mathematical expressions
Detection of variables
Proximity-based matching of ...
Other supported features

22 / 29
Other supported features

Article metadata extraction (title, author names,
publication year etc.) according to AKT Portal...
Outline

1 Introduction
2 Approach
3 Use Cases

24 / 29
Finding DBpedia Entities in Mathematical Formulas
http://cll.niimm.ksu.ru/iswc-demo
1

2

25 / 29
Semantic Search of Theoretical Findings
Finding articles with theorems about finite groups

PREFIX moc: <http://cll.niimm.k...
Conclusion
We have developed a holistic approach for mining
LOD representation of scholarly papers in
mathematics
We appli...
Future Work

Integrating all the modules into a full-fledged toolkit
Add support of English to the NLP module
Extend our ap...
Thanks for your attention!
Questions?

29 / 29
Upcoming SlideShare
Loading in...5
×

Bringing Math to LOD

993

Published on

The presentation slides at ISWC 2013

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
993
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Bringing Math to LOD"

  1. 1. Bringing Math to LOD: A Semantic Publishing Platform Prototype for Scientific Collections in Mathematics Olga Nevzorova, Nikita Zhiltsov, Danila Zaikin, Olga Zhibrik, Alexander Kirillovich, Vladimir Nevzorov, Evgeniy Birialtsev Kazan Federal University Russia October 23, 2013 1 / 29
  2. 2. Outline 1 Introduction 2 Approach 3 Use Cases 2 / 29
  3. 3. Our Contribution Our prototype is geared to build a semantic graph of mathematical knowledge objects, that is extracted from a collection of mathematical scholarly papers, and is integrated into the LOD «cloud» 3 / 29
  4. 4. Research Output IVM Data Set LOD representation of 1 330 scholarly publications of the «Izvestiya Vuzov. Matematika» (IVM) journal Covers the semantics of: article metadata elements of the logical structure terminology formulas Aligned with DBpedia, CORDIS More than 850 000 RDF triples SPARQL endpoint: http://cll.niimm.ksu.ru:8890/sparql-auth∗ ∗ the SPARQL endpoint is secured. Please email the authors for credentials 4 / 29
  5. 5. Related Work Domain-specific languages: OMDoc, MathLang Domain models: Cambridge Mathematical Thesaurus, DBpedia (math-related part), ScienceWISE Ontology Math-related NLP: mArachna; linguistic modules of arXMLiv 5 / 29
  6. 6. Outline 1 Introduction 2 Approach 3 Use Cases 6 / 29
  7. 7. Key Research Contributions a thorough ontological model of the mathematical domain an ontology-based language-independent method for extraction of logical structure elements in papers an ontology-based method for extraction of mathematical named entities from texts in Russian a method that connects mathematical named entities to symbolic expressions 7 / 29
  8. 8. Prototype’s Design 8 / 29
  9. 9. Domain Model 9 / 29
  10. 10. Ontology of Structural Elements (1) http://cll.niimm.ksu.ru/ontologies/mocassin Covers 15 common structural elements: Defines 9 object properties and 4 datatype properties: 10 / 29
  11. 11. Ontology of Structural Elements (2) http://cll.niimm.ksu.ru/ontologies/mocassin 3 cardinality axioms, e.g. Proof ∧ (= 1 proves ProvableStatement† ) 2 transitivity axioms for hasPart and dependsOn properties DL expressivity: SRIN (D) † i.e., Claim ∨ Corollary ∨ Lemma ∨ Proposition ∨ Theorem 11 / 29
  12. 12. Ontology of Mathematical Concepts (1) http://cll.niimm.ksu.ru/ontologies/mathematics Covers 3 450 mathematical concepts Defines commonly used terms as well as terms from the emerging professional vocabulary (e.g. Bitsadze-Samarsky problem) Supports Russian/English labels 12 / 29
  13. 13. Ontology of Mathematical Concepts (2) http://cll.niimm.ksu.ru/ontologies/mathematics Includes two taxonomies: taxonomy of mathematical theories‡ : number theory, set theory, algebra, analysis, geometry, mathematical logic, discrete mathematics, theory of computation, differential equations, numerical analysis, probability theory and statistics taxonomy of mathematical objects Covers common scientific concepts, such as Problem, Method, Statement, Formula etc. DL expressivity: ALCHI ‡ covers just a part of the mathematical knowledge 13 / 29
  14. 14. Ontology of Mathematical Concepts (3) Object properties belongsTo/contains, e.g. Barycentric Coordinates belongsTo Metric Geometry defines/isDefinedBy, e.g. Christoffel Symbol isDefinedBy Connectedness seeAlso, e.g. Chebyshev Iterative Method seeAlso Numerical Solution of Linear Equation Systems 14 / 29
  15. 15. Ontology of Mathematical Concepts (4) Stats 3 450 classes 27% of classes are mapped onto DBpedia 3 630 subclass-of property instances 1 140 other object property instances Common facts about the development: lasted for 4 months 7 pro mathematicians participated as domain experts guided by the authors WebProtege was used as a collaborative tool 15 / 29
  16. 16. Semantic Annotation 16 / 29
  17. 17. NLP Annotation Relies on the OntoIntegrator facilities Solves some of the conventional linguistic tasks, such as: tokenization sentence splitting (∼ 98% F-measure§ ) morphological analysis NP extraction (88% precision) Special handling of math symbols, abbreviations, and math expressions as parts of NPs Currently supports only Russian language § the metrics were evaluated on real math texts with the help of domain experts 17 / 29
  18. 18. Mining the Logical Structure Supports our ontology of structural elements: elements in real texts are instances of the ontology classes Recognizing types of structural elements: A string similarity based method gives 89%-100% F-measure depending on the class Recognizing semantic relations between them: A decision tree learner gives 61%-95% F-measure depending on the relation 18 / 29
  19. 19. Mathematical Named Entity Extraction Supports our ontology of mathematical concepts: assigned NPs are instances of the ontology classes Our method employs annotations of the NP structure and Jaccard similarity The method gives 86% F-measure with parameters focusing on precision/recall trade-off 19 / 29
  20. 20. Connecting Named Entities to Formulas 20 / 29
  21. 21. Connecting Named Entities to Formulas Parsing mathematical expressions Detection of variables Proximity-based matching of mathematical variables with noun phrases at 68% accuracy 21 / 29
  22. 22. Other supported features 22 / 29
  23. 23. Other supported features Article metadata extraction (title, author names, publication year etc.) according to AKT Portal schema Semi-manual interlinking¶ with existing LOD data sets: DBpedia, CORDIS Publishing the extracted data as an LOD-compliant RDF data set ¶ by leveraging the Silk app 23 / 29
  24. 24. Outline 1 Introduction 2 Approach 3 Use Cases 24 / 29
  25. 25. Finding DBpedia Entities in Mathematical Formulas http://cll.niimm.ksu.ru/iswc-demo 1 2 25 / 29
  26. 26. Semantic Search of Theoretical Findings Finding articles with theorems about finite groups PREFIX moc: <http://cll.niimm.ksu.ru/ontologies/mocassin#> PREFIX math: <http://cll.niimm.ksu.ru/ontologies/mathematics#> SELECT ?article WHERE { ?article moc:hasSegment ?theorem . ?theorem moc:mentions ?entity; a moc:Theorem . ?entity a math:E2183 } 26 / 29
  27. 27. Conclusion We have developed a holistic approach for mining LOD representation of scholarly papers in mathematics We applied the prototype to a collection of over 1 300 real math papers We conducted a thorough evaluation of the proposed methods with the help of domain experts We provided several use cases to illustrate the utility of the published data 27 / 29
  28. 28. Future Work Integrating all the modules into a full-fledged toolkit Add support of English to the NLP module Extend our approach to texts on other natural science domains 28 / 29
  29. 29. Thanks for your attention! Questions? 29 / 29
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×