The document presents a semantic model and analysis methods for extracting the logical structure of mathematical scholarly papers. It proposes an ontology to represent structural elements like definitions, theorems, and their relations. Methods are described to recognize segment types using LaTeX markup and classify relations between segments using supervised learning. An evaluation shows the ontology covers over 90% of segments in test papers. A prototype demonstrates semantic search and visualization of paper structure.
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITYIJwest
In ontology engineering, there are many cases where assessing similarity between ontologies is required, this is the case of the alignment activities, ontology evolutions, ontology similarities, etc. This paper presents a new method for assessing similarity between concepts of ontologies. The method is based on the
set theory, edges and feature similarity. We first determine the set of concepts that is shared by two ontologies and the sets of concepts that are different from them. Then, we evaluate the average value of similarity for each set by using edges-based semantic similarity. Finally, we compute similarity between
ontologies by using average values of each set and by using feature-based similarity measure too.
Language Combinatorics: A Sentence Pattern Extraction Architecture Based on C...Waqas Tariq
A \"sentence pattern\" in modern Natural Language Processing is often considered as a subsequent string of words (n-grams). However, in many branches of linguistics, like Pragmatics or Corpus Linguistics, it has been noticed that simple n-gram patterns are not sufficient to reveal the whole sophistication of grammar patterns. We present a language independent architecture for extracting from sentences more sophisticated patterns than n-grams. In this architecture a \"sentence pattern\" is considered as n-element ordered combination of sentence elements. Experiments showed that the method extracts significantly more frequent patterns than the usual n-gram approach.
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITYIJwest
In ontology engineering, there are many cases where assessing similarity between ontologies is required, this is the case of the alignment activities, ontology evolutions, ontology similarities, etc. This paper presents a new method for assessing similarity between concepts of ontologies. The method is based on the
set theory, edges and feature similarity. We first determine the set of concepts that is shared by two ontologies and the sets of concepts that are different from them. Then, we evaluate the average value of similarity for each set by using edges-based semantic similarity. Finally, we compute similarity between
ontologies by using average values of each set and by using feature-based similarity measure too.
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...dannyijwest
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...dannyijwest
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Lecture slides by Mustafa Jarrar at Birzeit University, Palestine.
See the course webpage at: http://jarrar-courses.blogspot.com/2011/09/knowledgeengineering-fall2011.html
and http://www.jarrar.info
and on Youtube:
http://www.youtube.com/watch?v=3_-HGnI6AZ0&list=PLDEA50C29F3D28257
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
Similar to Logical Structure Analysis of Scientific Publications in Mathematics (20)
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Logical Structure Analysis of Scientific Publications in Mathematics
1. Logical Structure Analysis of
Scientific Publications in Mathematics
Valery Solovyev, Nikita Zhiltsov
Kazan (Volga Region) Federal University, Russia
1 / 44
2. Overview
LOD Cloud has been growing at 200-300%
per year since 2007∗
Prevalent domains: government (43%),
geographic (22%) and life sciences (9%)
However, it lacks data sets related to
academic mathematics
∗ C.Bizer
et al. State of the Web of Data.
LDOW WWW’11
2 / 44
3. 1 Background
2 Proposed Semantic Model
3 Analysis Methods
4 Experiments and Evaluation
5 Prototype
3 / 44
4. Mathematical Scholarly Papers
Essential features
Well-structured documents
The presence of mathematical formulae
Peculiar vocabulary (“mathematical
vernacular”)
4 / 44
5. Research Objectives
Current study
Specification of the document logical structure
Methods for extracting structural elements
Long-term goals
A large corpus of semantically annotated papers
Semantic search of mathematical papers
5 / 44
6. Modelling the Structure of Scientific
Publications
ABCDE format
LaTeX-based format to represent the narrative
structure of proceedings and workshop contributions
Sections:
ˆ Annotations (Dublin Core metadata)
ˆ Background (e.g. description of research positioning)
ˆ Contribution (description of the presented work)
ˆ Discussion (e.g. comparison with other work)
ˆ Entities (citations)
6 / 44
7. Modelling the Structure of Scientific
Publications
SALT
LaTeX-based authoring tool for generating
semantically annotated PDF documents
Three ontologies:
ˆ SALT Document Ontology
ˆ SALT Annotation Ontology
ˆ SALT Rhetorical Ontology
7 / 44
11. Trade-off Candidates
arXMLiv format
ˆ XHTML+MathML
ˆ Marked up theorem-like elements, sections,
equations
ˆ Automatic conversion for LaTeX documents with
styles of available bindings (LaTeXML)
ˆ 60% of arXiv.org were converted into the format
Present work
ˆ Follow the slides ⇒
11 / 44
12. 1 Background
2 Proposed Semantic Model
3 Analysis Methods
4 Experiments and Evaluation
5 Prototype
12 / 44
14. Proposed Semantic Model
It is an ontology that captures the structural layout
of mathematical scholarly papers (as in the LaTeX
markup)
The segment represents the finest level of
granularity and has the properties:
ˆ starting and ending positions
ˆ the text or math contents
ˆ functional role
Select most frequent segments from sample
collections of genuine papers
Consider synonyms as one concept (e.g. conjecture
and hypothesis)
14 / 44
15. Proposed Semantic Model (cont.)
Select basic semantic relations between segments
from the prior-art models
Integration with SALT Document Ontology classes:
ˆ Publication
ˆ Section
ˆ Figure
ˆ Table
15 / 44
17. 1 Background
2 Proposed Semantic Model
3 Analysis Methods
4 Experiments and Evaluation
5 Prototype
17 / 44
18. Logical Structure Analysis
The ontology specifies a controlled vocabulary to
semantic analysis
Two analysis tasks:
ˆ recognizing the types of document segments
ˆ recognizing the semantic relations between them
18 / 44
22. Recognizing the Types of Document
Segments
We exploit the LaTeX markup extensively
1 Elicit a LaTeX environment
2 Associate it with a string that may be
either the environment name
or the environment title (if available)
3 Filter out standard formatting environments (e.g.
center, align, itemize)
4 Compute string similarity between a string and
canonical names of ontology concepts
5 Check if the found most similar concept is
appropriate using a predefined threshold
22 / 44
23. Recognizing Navigational Relations
The dependsOn and refersTo relations are navigational
Assumption
Navigational relations are induced by referential
sentences
Examples
“By applying Lemma 1, we obtain ...” (dependsOn)
“Theorem 2 provides an explicit algorithm ...”
(refersTo)
23 / 44
24. Recognizing Navigational Relations
Supervised method
1 Given a segment S; split its text into sentences,
tokenize and do POS tagging
2 Referential sentences are ones that contain the ref
command entries
3 For each sentence:
ˆ find mentioned segments; each of them makes a pair
with S (type feature)
ˆ for each pair, compute relative positions of segments
normalized by the document size (distance feature)
ˆ build a boolean vector for its verbs (verb feature)
24 / 44
25. Recognizing Navigational Relations
(cont.)
Supervised method
Example training instance
t1 t2 d1 d2 add ... apply ... relation
proof lemma 0.09 0.27 0 ... 1 ... dependsOn
Train a learning model using these features and a
labeled example set
Apply the model to classify new induced relations
25 / 44
26. Recognizing Restricted Relations
The hasConsequence, exemplifies and proves relations
are restricted
Assumption
Restricted relations occur between consecutive
segments
26 / 44
27. Recognizing Restricted Relations (cont.)
Baseline method
According to the ontology, restricted relations involve
instances of three types, separately: Corollary, Example
and Proof
1 Seek a segment of one of these types
2 Find its segments-predecessors
3 Filter out segments of inappropriate types
4 Return the closest predecessor
27 / 44
28. 1 Background
2 Proposed Semantic Model
3 Analysis Methods
4 Experiments and Evaluation
5 Prototype
28 / 44
29. Experimental Setup
Collections
1355 papers of the “Izvestiya Vysshikh Uchebnykh
Zavedenii. Matematika” journal
A sample of 1031 papers from arXiv.org
Implementation
An open source Java library built upon:
LaTeX-to-XML converters
GATE framework
Weka
Jena
See http://code.google.com/p/mocassin
29 / 44
30. Segment Recognition Evaluation
Evaluation on the arXiv sample only
Q-gram string matching algorithm was used
The threshold value was optimized w.r.t. F1 -score
Type # of F1 -score
true instances
Axiom 5 1.000
Claim 114 0.987
Conjecture 152 0.987
Corollary 1715 0.995
Definition 1838 1.000
Example 771 0.999
Lemma 4061 0.998
Proof 4943 0.997
Proposition 3052 0.999
Remark 2114 1.000
Theorem 4670 0.991
other 671 0.892
30 / 44
31. Ontology Coverage Evaluation
Evaluation on the both entire collections (“Izvestiya”
and arXiv)
Equations are most ubiquitous segments (52% and
69%, respectively)
The ontology covers types of 91.9% and 91.6% of
segments (with SALT Section class – 99.5% and
99.6%)
31 / 44
32. Percentage of segment occurrences
0%
5%
10%
15%
20%
25%
30%
Theorem
Proof
Lemma
Remark
Corollary
Definition
Proposition
Example
others
Claim
Distribution of Segment Types
Conjecture
arXiv
Izvestiya
32 / 44
33. Evaluation of Navigational Relation
Recognition
A paper contains 51.4 (Izvestiya) and 53.9 (arXiv)
referential sentences on the average
243 referential sentences were randomly selected
and manually annotated
95% were true navigational relations
A decision tree learner (C4.5) was trained
The results were from 10-fold cross validation
Features Accuracy F1 -score F1 -score
refersTo dependsOn
type 0.663 0.566 0.752
type+distance 0.658 0.663 0.704
type+verb 0.704 0.653 0.770
type + distance + verb 0.741 0.744 0.772
33 / 44
35. Evaluation of Restricted Relation
Recognition
Evaluation on the arXiv sample only
10% of the documents which contain certain
segments were randomly selected
For each such a segment, corresponding relations
were annotated manually
Known issues: imported corollaries and examples
for arbitrary text fragments
Relation # of instances F1 -score
hasConsequence 178 0.687
exemplifies 62 0.613
proves 216 0.954
35 / 44
36. Conclusion on Evaluation
The ontology covers the largest part of the logical
structure and appears to be feasible for automatic
extraction methods
The task of segment type recognition has been
accomplished
The method for recognizing navigational relations
establishes ground truth, however, a large-scale
evaluation and learning model selection are required
The baseline method for recognizing restricted
relations must be improved by leveraging additional
information (discussed in the paper!)
36 / 44
37. 1 Background
2 Proposed Semantic Model
3 Analysis Methods
4 Experiments and Evaluation
5 Prototype
37 / 44
38. Prototype
A prototype:
demonstrates our ongoing research on
semantic search of mathematical papers
incorporates the logical structure analysis
methods
is integrated with arXiv API
enables enhanced search for arXiv papers
and visualization of their logical structure
publishes the semantic index as Linked
Data via SPARQL endpoint
38 / 44
42. Preview a Search Result
http://cll.niimm.ksu.ru/mocassin
42 / 44
43. Summary
The proposed approach aims to analyze the
structure of mathematical scholarly papers
in an automatic way
Our ontology provides a controlled
vocabulary for analysis
The methods elicit document segments in
terms of the ontology
The extracted semantic graph can be used
for:
ˆ discovering important document parts
ˆ semantic search of theoretical results
43 / 44