[poster] Extracting Information From Classics Scholarly Texts

•

0 likes•497 views

This document provides an overview of a PhD research project aimed at developing an automatic system to extract structured information from a corpus of unstructured classics scholarly texts, in order to improve information retrieval capabilities. The project involves building a corpus from open access classics journal papers, applying natural language processing techniques to identify mentions of people, places, works, and other entities within texts, and using structured data from existing databases to disambiguate entity mentions and automatically generate new indices linking texts. The expected results are providing multiple meaningful access points to information within the corpus and demonstrating the scalability of the approach.

EXTRACTING INFORMATION FROM CLASSICS SCHOLARLY TEXTS
Matteo Romanello, matteo.romanello@kcl.ac.uk

Goal HIDDEN WORD PUZZLE
The project at a glance
● PhD research project in Digital Humanities
Devising an automatic system to improve To solve the puzzle find the
(DH) information retrieval over a discipline-specif c
i words in the schema by
● discipline: DH, Classics (Greek and Latin
corpus of unstructured texts. using a word list as clue.
literature)
CORPUS: Open Access At the end you'll have added
● topic: extracting structured information from a information to the initially
collection of Classics journal papers
corpus of unstructured texts chaotic picture.
Why automatic? Because automatic means also Steps
Gone digital. What changed? scalable when you are dealing with a huge quantity
of data. 1. Building the corpus (OCR, preprocessing)
We are moving from books to e- Information retrieval: the task of retrieving
books, and from journals to e- information (most of the times accomplished by 2. Making the data sources interoperable
journals as we are using them using search engines) (when the same entity E appears in DB1 and DB2,
almost daily. the information about E in DB1 have to be added
Corpus of unstructured texts: collection of plain
Is our way of accessing texts, without any kind of mark-up (such as XML). to information about E in DB2)
information actually changed
with the use of digital tools? 3. Finding in the corpus the mentions of
Information can be REALIA (place, names, work passages, etc.)
Did just the format change or accessed using multiple
are we provided with innovative access points that are 4. Disambiguating the mentions of REALIA
ways of accessing information meaningful for scholars
based on digital technologies? in a specif c f eld.
i i 5. Automatic creation of new indices to the
texts

Access points to information in Classics Method
Expected results
Print resources 1. Reuse existing data resources containing
structured information (such as gazetteers, ●Providing automatically multiple
● Table of Content (TOC)
authority lists, etc.) stored using different data meaningful entry points to information
● Indexes (index of citations, index of greek word,

index of geographic place, index of names, etc.) formats (Relational DataBases, XML f les, i ● Enrich the corpus with links to navigate
etc.)
Electronic resources through resources
● TOCs 2. Apply Computational Linguistic and
● Access through search engines Natural Language Processing algorithms ● Exploiting extracted information to
● ? for the information extraction improve user access to the corpus

* usually provided just for monographs because expensive 3. Use structured data as training data for ●Demonstrate the scalability of the
to be produced the algorithms which “mines” the unstructured approach
text corpus

Centre of Computing in the Humanities (CCH), King's College London

1) The document describes a project to develop an automatic system to extract semantic information from unstructured scholarly texts in classics, focusing on named entities and references. 2) The goal is to build knowledge bases integrating information from multiple sources to improve information retrieval over a classics corpus. 3) The project involves building corpora from online archives, processing texts to extract entities and references, and developing techniques to recognize canonical and bibliographic references.

The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility

Christoph Lange

The document discusses the Distributed Ontology Language (DOL) which aims to support semantic integration and interoperability across heterogeneous ontologies. DOL allows for logically heterogeneous ontologies, modular ontologies, and formal and informal links between ontologies. It has a formal semantics and can be serialized in XML, RDF, and text. Examples of applications that could benefit from DOL include an ontology repository engine and a multilingual map user interface driven by aligned ontologies.

Data Integration at the Ontology Engineering Group

Oscar Corcho

A Methodological Framework for Ontology and Multilingual Termontological Data...

Christophe Debruyne

A Methodological Framework for Ontology and Multilingual Termontological Database Co-evolution C. Debruyne, C. Vasquez, K. Kerremans, and A.D. Burgos LNCS 7567, p. 220 ff. Ontologies and Multilingual Termontology Bases (MTB) are two knowledge artifacts with different characteristics and different purposes. Ontologies are used to formally capture a shared view of the world to solve particular interoperability and reasoning tasks. MTBs are general, contain fewer types of relations and their purposes are to relate several term labels within and across different languages to cat- egories. For regions in which the multilingual aspect is vital, not only does one need an ontology for interoperability, the concepts in that ontology need to be comprehensible for everyone whose native tongue is one of the principal languages of that region. Multilinguality provides also a powerful mechanism to perform ontology mapping, content annotation, multilingual querying, etc. We intend to meet these challenges by linking both methods for constructing ontologies and MTBs, creating a virtuous cycle. In this paper, we present our method and tool for ontology and MTB co-evolution.

Linked Open (Geo)Data and the Distributed Ontology Language – a perfect match

Christoph Lange

The Distributed Ontology Language is a meta-language for integrating ontologies written in different languages. Our notion of “distributed” comprises logical heterogeneity within ontologies, modularity and reuse, and links across ontologies in different places of the Web. Not only can ontologies be distributed across the Web, but DOL's supply of supported ontology languages can also be extended in a decentral way. For this functionality, DOL builds on the Linked Open Data (LOD) principles. But DOL also contributes to LOD use cases. Many current LOD applications are limited by the weak expressivity of the RDF and RDFS languages commonly used to express data and vocabularies. Completely switching to a more expressive language would impair scalability to big datasets. DOL addresses the scalability and expressivity requirements by allowing to represent each aspect of a dataset in the most suitable language and keeping these different representations connected. This is particularly useful in geographic information systems, where big datasets (e.g. Linked Geo Data, the LOD version of OpenStreetMap) need to be integrated with formalisations of complex spatial notions (e.g. in the first-order language Common Logic).

A Mathematical Approach to Ontology Authoring and Documentation

Christoph Lange

This document proposes using OMDoc, a framework for representing formal knowledge, to improve ontology authoring and documentation. It describes how OMDoc can: 1) Provide better support for modularity, documentation at different granularities, and linking documentation to formal representations compared to languages like OWL. 2) Model existing ontologies and translate between OMDoc and OWL/RDF formats to leverage existing tools. 3) Allow comprehensive, integrated documentation of ontologies through features like literate programming. The approach is evaluated by reimplementing the FOAF ontology in OMDoc.

DB-IR-ranking

FELIX75

DB and IR Integration

Marco A Torres

The document discusses the convergence of database and information retrieval systems. It notes that both fields have traditionally focused on either structured or unstructured data but are now combining aspects of both. This is driven by new application needs that require flexible querying of both text and structured data. The document outlines the history and developments in this area, including early XML IR systems and more recent graph-based approaches that integrate ranking and probabilistic models from IR into structured querying.

A presentation given at the "Data Stewardship: Increasing the Integrity and Effectiveness of Science and Scholarship" Session on Friday, June 8 2012 at the IASSIT 2012 conference in Washington DC. This presentation introduced data publishing, using a social science (archaeology) case study to explore editorial processes and dissemination outcomes that increasingly demand “Linked Data” capabilities.

Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts

Matteo Romanello

This document outlines a project to develop tools to extract information from classics scholarly texts. It aims to improve information retrieval for classics researchers by automatically identifying mentions of realia (people, places, sources) and extracting canonical references to primary sources from unstructured texts. The methodology involves building corpora of classics articles, creating a knowledge base from existing structured classics data sources, and developing natural language processing tools trained on the knowledge base to extract entities and references from the text corpora. The expected results are improved access points to information for researchers through enriched full-text search and links to relevant primary sources.

AI Beyond Deep Learning

Andre Freitas

This document summarizes Andre Freitas' talk on AI beyond deep learning. It discusses representing meaning from text at scale using knowledge graphs and embeddings. It also covers using neuro-symbolic models like graph networks on top of knowledge graphs to enable few-shot learning, explainability, and transportability. The document advocates that AI engineers should focus on representation design and evaluating multi-component NLP systems.

Archiving and managing a million or more data files on BiG Grid

pkdoorn

This document summarizes a presentation given by Peter Doorn on archiving and managing over a million data files on the Big Grid infrastructure. It discusses two projects undertaken by DANS to analyze and visualize large humanities datasets and archive over a million files. It provides examples of other projects in countries like Germany, the UK, and Italy that use grid technologies for social science and humanities research dealing with large datasets.

Linked Data: Een extra ontstluitingslaag op archieven

Richard Zijdeman

Post 1What is text analytics How does it differ from text mini.docx

stilliegeorgiana

Post 1: What is text analytics? How does it differ from text mining? Text Analytics is applying of statistical and machine learning techniques to be able to predict /prescribe or infer any information from the text-mined data. Text mining is a tool that helps in getting the data cleaned up.Text analytics and text mining approaches have essentially equivalent performance. Text analytics requires an expert linguist to produce complex rule sets, whereas text mining requires the analyst to hand-label cases with outcomes or classes to create training data. Differences between Text Mining and Text Analytics: • Text Mining and Text Analytics solve the same problems, but use different techniques and are complementary ways to automatically extract meaning from text. • Text Analytics is developed within the field of computational linguistics. It has the ability to encode human understanding into a series of linguistic rules which are generated by humans are high in precision, but they do not automatically adapt and are usually fragile when tried in new situations. • Text mining is a newer discipline arising out of the fields of statistics, data mining, and machine learning. Its strength is the ability to inductively create models from collections of historical data. Because statistical models are learned from training data they are adaptive and can identify “unknown unknowns”, leading to the better recall. Still, they can be prone to missing something that would seem obvious to a human. • Text analytics and text mining approaches have essentially equivalent performance. Text analytics requires an expert linguist to produce complex rule sets, whereas text mining requires the analyst to hand-label cases with outcomes or classes to create training data. • Due to their different perspectives and strengths, combining text analytics with text mining often leads to better performance than either approach alone. 2. What technologies were used in building Watson (both hardware and software)? Watson is an extraordinary computer system (a novel combination of advanced hardware an software) designed at answering questions posed in natural human language.Watson is an artificially intelligent computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO and industrialist Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings. Watson received the first prize of $1 million.The goal was to advance computer science by exploring new ways for computer technology to affect science, business, and society.IBM undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show Jeopardy!The extent of the challenge in ...

Post 1What is text analytics How does it differ from text mini

anhcrowley

Linked Open data: CNR

DatiGovIT

The document describes the Semantic Scout, a framework developed by CNR Semantic Technology Lab for searching, presenting, and analyzing entities from CNR data sources using semantic web, linked open data, natural language processing, and information retrieval techniques. It summarizes the goals and architecture of the Semantic Scout, including how it converts CNR data into ontologies and triples, publishes and links the data, and allows users to search and explore the data through a SPARQL endpoint and other interfaces. The document also provides an example of how the Semantic Scout can be used to identify experts on a topic by searching the integrated CNR data cloud.

OAI7 Research Objects

seanb

This document discusses research objects as a framework for facilitating the exchange and reuse of digital knowledge. Research objects are defined as semantically rich aggregations of resources that support a research objective. They allow for workflows, data, documents and other resources to be bundled together and shared. The document outlines several motivating projects, challenges in developing research object models and vocabularies, and a vision for how research objects could allow research to be more efficient, effective and ethical through increased reuse of digital knowledge.

USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY

cseij

This document summarizes research on using ontologies to overcome drawbacks of databases and vice versa. It discusses how ontologies can be used to store and manage large numbers of database instances to improve performance. It also explains how databases can help address issues with ontologies, such as a lack of semantics, by providing structured storage. The document reviews drawbacks of both databases and ontologies and how each can help address limitations of the other through integration. This mutual benefit is an active area of research at the intersection of databases and ontologies.

Text mining introduction-1

Sumit Sony

The document discusses various topics related to unstructured data analytics including text mining, web mining, and big data. It provides details on text mining tasks like information extraction, topic tracking, summarization, classification, clustering, and association. The key aspects of text mining discussed are preprocessing text data through tokenization, part-of-speech tagging, and semantic analysis. Text mining aims to extract useful information and discover patterns from large collections of unstructured text documents.

Topic Extraction on Domain Ontology

Keerti Bhogaraju

This document discusses topic extraction for domain ontology. It describes domain ontology as a collection of vocabularies and conceptualization of a given domain. The purpose of topic extraction is to identify relevant concepts in documents, obtain domain-specific terms, classify documents, and identify key concepts and relationships for an ontology. The project stages include obtaining domain knowledge, preprocessing documents, and applying either K-Means clustering or Latent Dirichlet Allocation to extract topics. K-Means partitions data into clusters while LDA represents documents as mixtures over topics characterized by word distributions.

Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation

ekansa

This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.

Introduction to the Semantic Web

Nuxeo

Information Quality in the Web Era

Università degli Studi di Milano-Bicocca

Invited talk @ DCC09 workshop

Paolo Missier

The document discusses scientific workflow management systems and provenance. It notes that momentum is growing around data sharing, as evidenced by a special issue of Nature on the topic. Effective data sharing requires standards for packaging data with metadata into self-descriptive research objects, as well as representation of process provenance using workflow descriptions. Provenance captures causal relationships in scientific data and is important for understanding, reusing, and validating others' work. The Open Provenance Model aims to standardize provenance representation.

Inquiry Optimization Technique for a Topic Map Database

tmra

This document proposes an inquiry optimization technique for topic map databases. It discusses using an object-oriented data model for topic map databases to improve query performance compared to a relational model. The document defines cost estimation formulas to help the database system select the optimal retrieval route, either following associations or searching by topic, when answering queries. An experiment is needed to evaluate the effectiveness of using these cost estimations to optimize queries of a topic map database.

03 Object Dbms Technology

Laguna State Polytechnic University

1. Object databases store data as objects rather than in tables and rows like relational databases. They are recommended for complex data and high performance processing. 2. Object databases are designed to work well with object-oriented programming languages by supporting features like classes, inheritance, and late binding. 3. Early object database systems from the 1970s-1990s included Gemstone, O2, and Objectivity/DB. Commercial products were integrated with languages like Smalltalk, C++, and later Java.

A spatio-temporal visual analysis tool for historical dictionaries.

Technological Ecosystems for Enhancing Multiculturality

Adding structure to unstructured content for enhanced findability hakan tylen

Dynamic People B.V.

This document discusses how enterprise search can be improved through better metadata and content processing. Poor metadata like missing, inconsistent or incorrect data impairs search and users' trust in search results. Property extraction can generate metadata while indexing content to enhance findability. Case studies show how organizations like General Mills and Mississippi DOT improved search and decision making by processing content from multiple sources and exposing enriched metadata in search results through refiners. Key ingredients for successful enterprise search are addressing content growth, unifying siloed search interfaces, and automated metadata enrichment to reduce costs and help users find information faster.

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...

Matteo Romanello

Scaling up the Extraction of Canonical Citations in Classics

Matteo Romanello

The document discusses extracting canonical citations from classical texts at scale. It begins by explaining the importance of references in classics scholarship and trends toward enhanced reading. An approach is presented that uses named entity recognition, relation extraction, and disambiguation to extract citation components and assign identifiers. The extraction pipeline is evaluated on data from L'Année philologique, achieving a high F1 score. Overall, the approach aims to scale the extraction of citations to enable applications like search and network analysis over large corpora.

Similar to [poster] Extracting Information From Classics Scholarly Texts

IASSIT Kansa Presentation

ekansa

Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts

Matteo Romanello

AI Beyond Deep Learning

Andre Freitas

Archiving and managing a million or more data files on BiG Grid

pkdoorn

Linked Data: Een extra ontstluitingslaag op archieven

Richard Zijdeman

Post 1What is text analytics How does it differ from text mini.docx

stilliegeorgiana

Post 1What is text analytics How does it differ from text mini

anhcrowley

Linked Open data: CNR

DatiGovIT

OAI7 Research Objects

seanb

USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY

cseij

Text mining introduction-1

Sumit Sony

Topic Extraction on Domain Ontology

Keerti Bhogaraju

Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation

ekansa

Introduction to the Semantic Web

Nuxeo

Information Quality in the Web Era

Università degli Studi di Milano-Bicocca

Invited talk @ DCC09 workshop

Paolo Missier

Inquiry Optimization Technique for a Topic Map Database

tmra

03 Object Dbms Technology

Laguna State Polytechnic University

A spatio-temporal visual analysis tool for historical dictionaries.

Technological Ecosystems for Enhancing Multiculturality

Adding structure to unstructured content for enhanced findability hakan tylen

Dynamic People B.V.

Similar to [poster] Extracting Information From Classics Scholarly Texts (20)

IASSIT Kansa Presentation

Stuctured Vs Unstructured: Extracting Information from Classics Scholarly Texts

AI Beyond Deep Learning

Archiving and managing a million or more data files on BiG Grid

Linked Data: Een extra ontstluitingslaag op archieven

Post 1What is text analytics How does it differ from text mini.docx

Post 1What is text analytics How does it differ from text mini

Linked Open data: CNR

OAI7 Research Objects

USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY

Text mining introduction-1

Topic Extraction on Domain Ontology

Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation

Introduction to the Semantic Web

Information Quality in the Web Era

Invited talk @ DCC09 workshop

Inquiry Optimization Technique for a Topic Map Database

03 Object Dbms Technology

A spatio-temporal visual analysis tool for historical dictionaries.

Adding structure to unstructured content for enhanced findability hakan tylen

More from Matteo Romanello

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...

Matteo Romanello

Scaling up the Extraction of Canonical Citations in Classics

Matteo Romanello

Transforming Indexes Locorum into Citation Networks

Matteo Romanello

1) O documento descreve como índices de locorum podem ser transformados em redes de citações extraídas de textos clássicos. 2) Dados como o L'Année philologique são processados para extrair entidades nomeadas e relações de citação. 3) As citações extraídas são usadas para construir redes de citações em níveis macro, meso e micro que fornecem diferentes perspectivas sobre a intertextualidade.

Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...

Matteo Romanello

Introduction to the Text Reuse panel at DH 2014

Matteo Romanello

1) O documento discute a reutilização de textos no contexto das humanidades digitais, definindo-a como a reiteração significativa de texto para além da repetição de linguagem comum. 2) Os organizadores do painel visam compartilhar conhecimentos, discutir abordagens e fomentar pesquisas colaborativas futuras sobre o tema. 3) Serão abordados tópicos como a definição e tipos de reutilização de texto, infraestrutura para reutilização de texto e engajamento de usuários.

Exploring Citation Networks to Study Intertextuality in Classics

Matteo Romanello

Referring is such an essential part of scholarly activity across disciplines that it has been regarded by John Unsworth (2000) as one of the scholarly primitives. There is, however, a kind of citation whose potential has not been fully exploited to date, despite the attention they recently received within Digital Classics research (Romanello, Boschetti, and Crane 2009; Smith 2010; Romanello 2011). These are called “canonical citations” and are the references commonly used to refer to passages of ancient texts. Given their importance to classicists, Crane et al. (2009) have argued, services for extracting and exploiting them should be part of the Cyberinfrastructure for Classics. In this paper I discuss the various aspects of making such citations–together with the network of links they create–computable. Firstly, I will present the characteristics of such citations by showing how their semantics can be modeled by means of a formal ontology. Once such an ontology is created and populated, it can be used by a machine as a surrogate for domain knowledge in order to make inferences about texts and citations. Secondly, I will illustrate how an expert system that captures canonical citations and their meaning from modern journal papers can be implemented by using Natural Language Processing techniques that are well known in Computer Science. I will then present two resources that were developed for this task and made available under Open Source licenses: 1) a manually corrected, multilingual corpus of approximately 30,000 tokens drawn from L’Année Philologique with annotated Named Entities; 2) a machine learning-based classifier that can be trained with this corpus to extract from texts canonical citations and mentions of ancient authors and works. Finally, I will show some examples of how the citation network so extracted– consisting of journal papers and the ancient texts they refer to–can be exploited to offer scholars new ways and tools to studying intertexuality. References Crane, Gregory, Brent Seales, and Melissa Terras. 2009. “Cyberinfrastructure for Classical Philology.” Digital Humanities Quarterly 3. Romanello, Matteo. 2011. “New Value-Added Services for Electronic Journals in Classics.” JLIS.it 2. doi:10.4403/jlis.it-4603. Romanello, Matteo, Federico Boschetti, and Gregory Crane. 2009. “Citations in the digital library of classics: extracting canonical references by using conditional random fields.” In , 80–87. Morristown, NJ, USA: Association for Computational Linguistics. Smith, Neel. 2010. “Digital Infrastructure and the Homer Multitext Project.” In Digital Research in the Study of Classical Antiquity, ed. Gabriel Bodard and Simon Mahony, 121–137. Burlington, VT: Ashgate Publishing. Unsworth, John. 2000. “Scholarly Primitives: what methods do humanities researchers have in common, and how might our tools reflect this?.” http://www3.isrl.illinois.edu/~unsworth/Kings.5-00/primitives.html.

DARIAH Geo-browser: Exploring Data through Time and Space

Matteo Romanello

This document discusses the DARIAH Geo-browser tool, which allows users to explore datasets with both temporal and spatial information by visualizing the data on a map. The tool is suitable for exploratory research to help users visualize patterns within their data. It can import data in KML format that contains time and place information. As an example, the document demonstrates how the tool can be used to explore publications related to places along the Roman Limes by mapping the publications to the locations and dates. The key benefits highlighted are the ability to conduct exploratory research through data visualization, and the interoperability of the tool through its use of APIs, common identifiers, and ability to import/export different data formats.

Greedy Enough for the Grid?

Matteo Romanello

This document discusses using computational grids for resource-intensive digital humanities projects. It provides two examples of projects that would benefit from using a grid: 1) comparing OCR text to ground truth which requires large amounts of memory, and 2) aligning multiple OCR streams and generating error patterns which is computationally intensive. The document raises questions about preparing code written in Java to run on a grid, whether the programming language matters, how input/output operations and storage work on a grid, and if thread-based programs would be better suited for a grid.

Romanello tokyo

Matteo Romanello

DIGITAL HUMANITIES E FILOLOGIA Un'introduzione

Matteo Romanello

This document provides an introduction to digital humanities and philology. It discusses the history and methods of digital humanities, as well as key resources like journals, conferences, and projects. Examples of digital humanities applications for philology are described, such as parsing critical apparatuses and creating treebanks of ancient Greek. The document also outlines the digital tools available to philologists for finding, organizing, sharing, and reusing information in their work.

Ht159 Poster

Matteo Romanello

This document discusses extracting information from indices of quotations found in classical texts. It presents a parsing-based approach to extract data from the indices to support creating digital collections of ancient texts. Preliminary results show applying a fuzzy parser to an OCR transcription of an index extracted information from potentially noisy input by representing the hierarchical structure of author names and referenced works. The parsing results can be used to automatically tag quotations in texts and reconstruct hyperlinks between the index and text.

Rethinking Critical Editions of Fragments by Ontologies

Matteo Romanello

This document discusses rethinking the representation of fragmentary classical texts in digital editions through the use of ontologies. It addresses problems with current editions, such as duplication of text. The authors analyze the domain to identify concepts like fragments as interpretations linked to evidence. They design an ontology with classes for interpretations, textual passages, and linking fragments to witness texts. The benefits cited include a solid architecture separating texts from interpretations, formalization of the domain, and improved data interoperability.

Presentatio @ ELPUB 2008, Toronto

Matteo Romanello

The document proposes a microformat to encode references to canonical texts in classics literature to enable linking references to primary source materials. It discusses preliminary definitions including reference linking, distinguishing between primary sources like ancient texts and secondary sources like commentaries and articles, and defining canonical text references that will be encoded in the proposed microformat. The proposal is for critical value-added services for e-journals on classics using this new microformat.

Linking Primary and Secondary by Microformats

Matteo Romanello

The document discusses using microformats and Canonical Text Services (CTS) Uniform Resource Names (URNs) to link primary sources like classical texts to secondary sources that discuss or reference them. It proposes encoding citation references to primary sources semantically as microformats to allow for loose coupling between systems. This would make the linking system open-ended, language-neutral and distributed across services. The CTS URN system provides unambiguous identifiers for authors, works and text passages. Together, microformats and CTS URNs could enable new services like semantic parsing of citations and aggregating related information from different sources to support research.

M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...

Matteo Romanello

M.Romanello Ecal Presentation

Matteo Romanello

This document discusses linking references to ancient texts in secondary sources to relevant digital resources over the web. It proposes a microformat for embedding semantic information about canonical text references in XHTML documents. This would allow references to be mapped to requests to a text server using the Canonical Text Services protocol, in order to build a more distributed digital library and provide enhanced functionality like viewing referenced passages in context. Examples are given of how this could improve scholars' online research experience.

More from Matteo Romanello (16)

Towards the Automatic Retrieval of Cited Parallel Passages from Secondary Lit...

Scaling up the Extraction of Canonical Citations in Classics

Transforming Indexes Locorum into Citation Networks

Enhancing and Extending the Digital Study of Intertextuality (pt. 2): Reveali...

Introduction to the Text Reuse panel at DH 2014

Exploring Citation Networks to Study Intertextuality in Classics

DARIAH Geo-browser: Exploring Data through Time and Space

Greedy Enough for the Grid?

Romanello tokyo

DIGITAL HUMANITIES E FILOLOGIA Un'introduzione

Ht159 Poster

Rethinking Critical Editions of Fragments by Ontologies

Presentatio @ ELPUB 2008, Toronto

Linking Primary and Secondary by Microformats

M. Romanello, E-scholia: scenari digitali per la comunicazione scientifica in...

M.Romanello Ecal Presentation

Recently uploaded

Film vocab for eal 3 students: Australia the movie

Nicholas Montgomery

Chapter wise All Notes of First year Basic Civil Engineering.pptx

Denish Jangid

Chapter wise All Notes of First year Basic Civil Engineering Syllabus Chapter-1 Introduction to objective, scope and outcome the subject Chapter 2 Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country. Chapter 3 Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements. Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station. Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps. Chapter 4 Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation Chapter 5 Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures. Chapter 6 Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems. Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect Text Books: 1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers. 2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers. 3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House. 4. BCP, Surveying volume 1

NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx

iammrhaywood

Hindi varnamala | hindi alphabet PPT.pdf

Dr. Mulla Adam Ali

हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com

Constructing Your Course Container for Effective Communication

Chevonnese Chevers Whyte, MBA, B.Sc.

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...

PECB

Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency. Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor. His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects. What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results. Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment. Date: May 29, 2024 Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR ------------------------------------------------------------------------------- Find out more about ISO training and certification services Training: ISO/IEC 27001 Information Security Management System - EN | PECB ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB General Data Protection Regulation (GDPR) - Training Courses - EN | PECB Webinars: https://pecb.com/webinars Article: https://pecb.com/article ------------------------------------------------------------------------------- For more information about PECB: Website: https://pecb.com/ LinkedIn: https://www.linkedin.com/company/pecb/ Facebook: https://www.facebook.com/PECBInternational/ Slideshare: http://www.slideshare.net/PECBCERTIFICATION

RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students

Himanshu Rai

Temple of Asclepius in Thrace. Excavation results

Krassimira Luka

B. Ed Syllabus for babasaheb ambedkar education university.pdf

BoudhayanBhattachari

Wound healing PPT

Jyoti Chand

This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications. A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function. Healing is the body’s response to injury in an attempt to restore normal structure and functions. Healing can occur in two ways: Regeneration and Repair There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc. Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.

Liberal Approach to the Study of Indian Politics.pdf

WaniBasim

Advanced Java[Extra Concepts, Not Difficult].docx

adhitya5119

How to Make a Field Mandatory in Odoo 17

Celine George

C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx

mulvey2

MARY JANE WILSON, A “BOA MÃE” .

Colégio Santa Teresinha

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP

RAHUL

This Dissertation explores the particular circumstances of Mirzapur, a region located in the core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal environment for investigating the changes in vegetation cover dynamics. Our study utilizes advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to analyze the transformations that have taken place over the course of a decade. The complex relationship between human activities and the environment has been the focus of extensive research and worry. As the global community grapples with swift urbanization, population expansion, and economic progress, the effects on natural ecosystems are becoming more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for these activities. As the most crucial natural resource, its utilization by humans results in different 'Land uses,' which are determined by both human activities and the physical characteristics of the land. The utilization of land is impacted by human needs and environmental factors. In countries like India, rapid population growth and the emphasis on extensive resource exploitation can lead to significant land degradation, adversely affecting the region's land cover. Therefore, human intervention has significantly influenced land use patterns over many centuries, evolving its structure over time and space. In the present era, these changes have accelerated due to factors such as agriculture and urbanization. Information regarding land use and cover is essential for various planning and management tasks related to the Earth's surface, providing crucial environmental data for scientific, resource management, policy purposes, and diverse human activities. Accurate understanding of land use and cover is imperative for the development planning of any area. Consequently, a wide range of professionals, including earth system scientists, land and water managers, and urban planners, are interested in obtaining data on land use and cover changes, conversion trends, and other related patterns. The spatial dimensions of land use and cover support policymakers and scientists in making well-informed decisions, as alterations in these patterns indicate shifts in economic and social conditions. Monitoring such changes with the help of Advanced technologies like Remote Sensing and Geographic Information Systems is crucial for coordinated efforts across different administrative levels. Advanced technologies like Remote Sensing and Geographic Information Systems 9 Changes in vegetation cover refer to variations in the distribution, composition, and overall structure of plant communities across different temporal and spatial scales. These changes can occur natural.

คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1

สมใจ จันสุกสี

Leveraging Generative AI to Drive Nonprofit Innovation

TechSoup

Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...

Leena Ghag-Sakpal

The History of Stoke Newington Street Names

History of Stoke Newington

Recently uploaded (20)

Film vocab for eal 3 students: Australia the movie

Chapter wise All Notes of First year Basic Civil Engineering.pptx

NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx

Hindi varnamala | hindi alphabet PPT.pdf

Constructing Your Course Container for Effective Communication

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...

RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students

Temple of Asclepius in Thrace. Excavation results

B. Ed Syllabus for babasaheb ambedkar education university.pdf

Wound healing PPT

Liberal Approach to the Study of Indian Politics.pdf

Advanced Java[Extra Concepts, Not Difficult].docx

How to Make a Field Mandatory in Odoo 17

C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx

MARY JANE WILSON, A “BOA MÃE” .

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP

คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1

Leveraging Generative AI to Drive Nonprofit Innovation

Bed Making ( Introduction, Purpose, Types, Articles, Scientific principles, N...

The History of Stoke Newington Street Names

[poster] Extracting Information From Classics Scholarly Texts

1. EXTRACTING INFORMATION FROM CLASSICS SCHOLARLY TEXTS Matteo Romanello, matteo.romanello@kcl.ac.uk Goal HIDDEN WORD PUZZLE The project at a glance ● PhD research project in Digital Humanities Devising an automatic system to improve To solve the puzzle find the (DH) information retrieval over a discipline-specif c i words in the schema by ● discipline: DH, Classics (Greek and Latin corpus of unstructured texts. using a word list as clue. literature) CORPUS: Open Access At the end you'll have added ● topic: extracting structured information from a information to the initially collection of Classics journal papers corpus of unstructured texts chaotic picture. Why automatic? Because automatic means also Steps Gone digital. What changed? scalable when you are dealing with a huge quantity of data. 1. Building the corpus (OCR, preprocessing) We are moving from books to e- Information retrieval: the task of retrieving books, and from journals to e- information (most of the times accomplished by 2. Making the data sources interoperable journals as we are using them using search engines) (when the same entity E appears in DB1 and DB2, almost daily. the information about E in DB1 have to be added Corpus of unstructured texts: collection of plain Is our way of accessing texts, without any kind of mark-up (such as XML). to information about E in DB2) information actually changed with the use of digital tools? 3. Finding in the corpus the mentions of Information can be REALIA (place, names, work passages, etc.) Did just the format change or accessed using multiple are we provided with innovative access points that are 4. Disambiguating the mentions of REALIA ways of accessing information meaningful for scholars based on digital technologies? in a specif c f eld. i i 5. Automatic creation of new indices to the texts Access points to information in Classics Method Expected results Print resources 1. Reuse existing data resources containing structured information (such as gazetteers, ●Providing automatically multiple ● Table of Content (TOC) authority lists, etc.) stored using different data meaningful entry points to information ● Indexes (index of citations, index of greek word, index of geographic place, index of names, etc.) formats (Relational DataBases, XML f les, i ● Enrich the corpus with links to navigate etc.) Electronic resources through resources ● TOCs 2. Apply Computational Linguistic and ● Access through search engines Natural Language Processing algorithms ● Exploiting extracted information to ● ? for the information extraction improve user access to the corpus * usually provided just for monographs because expensive 3. Use structured data as training data for ●Demonstrate the scalability of the to be produced the algorithms which “mines” the unstructured approach text corpus Centre of Computing in the Humanities (CCH), King's College London

[poster] Extracting Information From Classics Scholarly Texts

Recommended

Recommended

More Related Content

Similar to [poster] Extracting Information From Classics Scholarly Texts

Similar to [poster] Extracting Information From Classics Scholarly Texts (20)

More from Matteo Romanello

More from Matteo Romanello (16)

Recently uploaded

Recently uploaded (20)

[poster] Extracting Information From Classics Scholarly Texts