The identification and cataloguing of documentary evidence is an important part of empirical research in the humanities.
An increasing number of recent initiatives in the digital humanities have as a primary objective the curation of collections of digital artefacts augmented with fine-grained metadata, for example, mentioning the entities and their relations, often adopting the "Linked Data" paradigm. This talk is focused on exploring the potential of Linked Data to support humanities scholars in identifying, collecting, and curating documentary evidence. First, I will introduce the basic notions around Linked Data and place its emergence in the tradition of Knowledge Representation, an area of Artificial Intelligence (AI). Second, I will show how Linked Data and AI techniques have been successfully applied in the Listening Experience Database project to support the retrieval and curation of documentary evidence. Finally, I will conclude the presentation by discussing the potential (and challenges) of adopting a "knowledge extraction" paradigm to automate the identification and cataloguing of metadata about documentary evidence in texts.
This presentation was provided by Twyla Gibson and Ann Campion Riley, both of the University of Missouri, during the NISO Virtual Conference, The Computer Campus: Integrating Information Systems and Services, held on August 15, 2018.
What is Digital Humanities?
What do we do under DH?
1. Digital Archives
Let us have introduction to a few projects
2. Computational Humanities
a. Using digital technology for analysis of literary text - research concerns
b. Using DT in teaching & learning - pedagogical concerns
c. Generative Literature
3. Multimodal Critique
The fundamentals of Humanities - Critical Inquiry
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Rose Holley
Presentation at ARLIS conference Darwin, September 2010 by Rose Holley. Demonstrates how Trove aggregrates information for Art resources and is a useful tool for researchers, artists and librarians.
This presentation was provided by Twyla Gibson and Ann Campion Riley, both of the University of Missouri, during the NISO Virtual Conference, The Computer Campus: Integrating Information Systems and Services, held on August 15, 2018.
What is Digital Humanities?
What do we do under DH?
1. Digital Archives
Let us have introduction to a few projects
2. Computational Humanities
a. Using digital technology for analysis of literary text - research concerns
b. Using DT in teaching & learning - pedagogical concerns
c. Generative Literature
3. Multimodal Critique
The fundamentals of Humanities - Critical Inquiry
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Rose Holley
Presentation at ARLIS conference Darwin, September 2010 by Rose Holley. Demonstrates how Trove aggregrates information for Art resources and is a useful tool for researchers, artists and librarians.
This file contains the introductory statements of participants in a discussion on scholarly publishing, accompanying articles published in NM&S, May 2013. The complete podcast of the discussion is available on the NM&S website: http://www.newmediaandsociety.com/
Doing Digital Scholarship: Discovering and using digital tools in academic work. Course syllabus, Internet Practice Part 2, April-June 2012, Univ. of Ljubljana, Faculty of Social Sciences. Instructor: Nick Jankowski
Making scholarly publications accessible onlineJonathan Bowen
Developing and monitoring communities has become increasingly easy on the web as the number of interactive facilities and amount of data available about communities increases. It is possible to view connections on social and professional networks in the form of mathematical graphs. It is also possible to visualise connections between authors of academic papers. For example, Google Scholar, Microsoft Academic Search, and Academia.edu, now have large corpuses of freely available information on publications, together with author and citation
details, that can be accessed and presented in a number of ways. In mathematical circles, the concept of the Erdős number has been introduced in honour of the Hungarian mathematician Paul Erdős, measuring the collaborative distance" of a person away from Erdős through links by co-author. Similar metrics have been proposed in other fields. The possibility of exploring and
improving the presentation of such links online in the sciences and other fields will be presented as a means of improving the outreach and impact of publications by academics across
different disciplines. Some practical guidance on what is worthwhile in presenting publication information online are given.
Roger Malina on A Historical Perspective on the Art-Sci-Tech fieldroger malina
Presentation given by Roger Malina on July 26 2014 at Kettle's Yard, Cambridge UK at
White Heat: art, science and
social responsibility in 1960s Britain
talk title is
The Leonardo Journal at 50_ networking the arts,sciences and technology now. The talk takes the person of Frank Malina, founder of Leonardo Journal as the springboard for a historical perspective
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
The task of identifying pieces of evidence in texts is of fundamental importance in supporting qualitative studies in various domains, especially in the humanities. In this paper, we coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. We devise a generic framework aimed at capturing themed evidence in texts based on a hybrid approach, combining statistical natural language processing, background knowledge, and Semantic Web technologies. The effectiveness of the method is demonstrated on a case study of a digital humanities database aimed at collecting and curating a repository of evidence of experiences of listening to music. Extensive experiments demonstrate that our hybrid approach outperforms alternative solutions. We also evidence its generality by testing it on a different use case in the digital humanities.
This file contains the introductory statements of participants in a discussion on scholarly publishing, accompanying articles published in NM&S, May 2013. The complete podcast of the discussion is available on the NM&S website: http://www.newmediaandsociety.com/
Doing Digital Scholarship: Discovering and using digital tools in academic work. Course syllabus, Internet Practice Part 2, April-June 2012, Univ. of Ljubljana, Faculty of Social Sciences. Instructor: Nick Jankowski
Making scholarly publications accessible onlineJonathan Bowen
Developing and monitoring communities has become increasingly easy on the web as the number of interactive facilities and amount of data available about communities increases. It is possible to view connections on social and professional networks in the form of mathematical graphs. It is also possible to visualise connections between authors of academic papers. For example, Google Scholar, Microsoft Academic Search, and Academia.edu, now have large corpuses of freely available information on publications, together with author and citation
details, that can be accessed and presented in a number of ways. In mathematical circles, the concept of the Erdős number has been introduced in honour of the Hungarian mathematician Paul Erdős, measuring the collaborative distance" of a person away from Erdős through links by co-author. Similar metrics have been proposed in other fields. The possibility of exploring and
improving the presentation of such links online in the sciences and other fields will be presented as a means of improving the outreach and impact of publications by academics across
different disciplines. Some practical guidance on what is worthwhile in presenting publication information online are given.
Roger Malina on A Historical Perspective on the Art-Sci-Tech fieldroger malina
Presentation given by Roger Malina on July 26 2014 at Kettle's Yard, Cambridge UK at
White Heat: art, science and
social responsibility in 1960s Britain
talk title is
The Leonardo Journal at 50_ networking the arts,sciences and technology now. The talk takes the person of Frank Malina, founder of Leonardo Journal as the springboard for a historical perspective
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
The task of identifying pieces of evidence in texts is of fundamental importance in supporting qualitative studies in various domains, especially in the humanities. In this paper, we coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. We devise a generic framework aimed at capturing themed evidence in texts based on a hybrid approach, combining statistical natural language processing, background knowledge, and Semantic Web technologies. The effectiveness of the method is demonstrated on a case study of a digital humanities database aimed at collecting and curating a repository of evidence of experiences of listening to music. Extensive experiments demonstrate that our hybrid approach outperforms alternative solutions. We also evidence its generality by testing it on a different use case in the digital humanities.
Capturing the semantics of documentary evidence for humanities researchEnrico Daga
Identifying and curating documentary evidence from textual corpora is an essential part of empirical research in the humanities.
Initially, we discuss "themed" evidence - traces of a fact or situation relevant to a theme of interest and focus on the problem of identifying them in texts. To that end, we combine statistical NLP, background knowledge, and Semantic Web technologies in a hybrid approach. We illustrate the method's effectiveness in a case study of a database of evidence of experiences of listening to music. We also evidence its generality by testing it on a different use case in the digital humanities.
Finally, we ponder the applicability of knowledge extraction techniques to automatically populate a database of documentary evidence and discuss the challenges from the point of view of scientific knowledge acquisition.
A whirlwind introduction to digital humanities for CDP Digital Humanities: Collections & Heritage - current challenges and futures workshop. February 22, 2018 Imperial War Museum
American Art Collaborative Linked Open Data presentation to "The Networked Cu...American Art Collaborative
An August 2017 presentation by Eleanor Fink to "The Networked Curator: Association of Art Museum Curators Foundation Digital Literacy Workshop for Art Curators"
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...Alba Morales
Knowledge Graphs (KGs) have emerged as a valuable tool for supporting humanities scholars and cultural heritage organisations. In this resource paper, we present the Musical Meetups Knowledge Graph (MMKG), a collection of evidence of historical collaborations between personalities relevant to the music history domain. We illustrate how we built the KG with a hybrid methodology that, combining knowledge engineering with natural language processing, including the use of Large Language Models (LLM), machine learning, and other techniques, identifies the constituent elements of a historical meetup. MMKG is a network of historical meetups extracted from ∼33k biographies collected from Wikipedia focused on European musical culture between 1800 and 1945. We discuss how, by providing a structured representation of social interactions, MMKG supports digital humanities applications and music historians’ research, teaching, and learning.
Rebecca Grant - DH research data: identification and challenges (DH2016)dri_ireland
Presentation made by Rebecca Grant as part of the panel session “Digital data sharing: the opportunities and challenges of opening research” at the Digital Humanities conference, Krakow, 15 July 2016. This paper “DH research data: identification and challenges” provided an introduction to concepts of research data in the digital humanities, including accepted definitions of what constitutes research data in a DH context.
STEAM to STEM: Redesigning Science Itself by Roger Malinaroger malina
Presented at Balance Un Balance Conference, Plymouth 2017 STEAM to STEM: How the arts, design and humanities can work with STEM to redesign science itself: The scientific method needs redesigning for the problems we are working on today. Scientific culture needs redesigning to couple better to the needed social re-design (design 4.0) for a sustainable global civilization .
How they might connect in a digital context. Invited keynote presentation in DARIAH workshop Practices and Context in Contemporary Annotation Activities. University of Hamburg, 29 October, 2015.
Describing Everything - Open Web standards and classificationDan Brickley
Original title: Open Web standards and classification: Foundations for a hybrid approach
Keynote address, UDC Seminar:
Classification at a Crossroads
30 October 2009 Koninklijke Bibliotheek, The Hague
Dan Brickley, Vrije University Amsterdam
Patterns in scholarly publications online: Erdős and beyondJonathan Bowen
Developing and monitoring communities has become increasingly easy on the web as the number of interactive facilities and amount of data available about communities increases. It is possible to view connections and patterns on social and professional networks in the form of mathematical graphs. It is also possible to visualise connections between authors of academic papers. For example, Google Scholar, Microsoft Academic Search, and Academia.edu, etc., now have large corpuses of freely available information on publications, together with author and citation details, that can be accessed and presented in a number of ways. In mathematical circles, the concept of the Erdős number has been introduced in honour of the Hungarian mathematician Paul Erdős, measuring the "collaborative distance" of a person away from Erdős through links by co-author. Similar metrics have been proposed in other fields. The possibility of exploring and improving the presentation of such links online in computer science and other fields will be presented as a means of improving the outreach and impact of academic publications. Some practical guidance on what is worthwhile in presenting publication information online will be given.
Linkage in Haze: challenges and take-home messages of crowd-sourcing vaguenes...Alessandro Adamou
With the transition of the Web of today from an information repository to a suite of services, the demand for machine-readable data to support the latter is now greater than ever. The social and, more generally, community element is proving to be a valuable medium to convey such a bulk of knowledge. Linked Data is a leading body of standards for publishing and using open knowledge bases on the Web, however, it very much relies upon the notion of identity. Every object of the world being described should be uniquely identified in order to be effectively manipulated. Music is a specially provocative domain of interest for such Web knowledge bases, being a topic where most people feel confident they can contribute to, yet with varying degrees of factual knowledge, personal inclination or scholarly rigour. Curating a dataset that covers an aspect new to this landscape, as is the evidence of listening experiences, means dealing with partial, inexplicit or underspecified information. A likely implication is that several elements of a listening experience, such as the listeners, the time in history or the music being heard, can be described to an extent but not identified, thus in stark contrast with a founding principle of Linked Data. This talk will illustrate the nature of the main elements of fuzzy knowledge that emerged from the contributions to the Listening Experience Database, elaborate on the countermeasures adopted and lessons learnt from the life-cycle of LED data, and assess the state of maturity of Linked Data technologies for accommodating such use-cases.
For many libraries, an institutional repository is an online archive to collect, preserve, and make accessible the intellectual output of an institution. For a growing bloc, the goal is to go further, beyond knowledge preservation to knowledge creation. These libraries are using their repositories to provide faculty with a proven publishing option by facilitating the production and distribution of original content often too niche for traditional publishers.
How do metadata librarians sift the incoming metadata with these different goals in mind? How do they optimize content for discovery in a wide range of resources such as online catalogs, external research databases, and major search engines? For a library that is also providing publishing services, what additional steps are necessary?
As the provider of Digital Commons, a repository and publishing platform for over 350 institutions, bepress has first-hand experience with these topics, and our consultants advise regularly on best practices for collecting, publishing, distributing, and archiving content. This presentation is intended for library professionals, whether their goal is to collect previously published works or to go further into library-led publishing. After an overview of common sources and destinations for metadata, attendees will come away with a set of considerations for streamlining workflows and optimizing content for discovery and distribution in major venues.
Eli Windchy is the VP, Consulting Services at bepress which provides software and services to the scholarly community. She received a Master's in Archaeology from University of Virginia, taught organic gardening, and for the last ten years has also been getting dirty with the metadata of Digital Commons repositories. She co-directs courses in institutional repository management and publishing, and she enjoys addressing the challenges of interoperability and scholarly communication.
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
Digital archives of memory institutions are typically concerned with the cataloguing of artefacts of artistic, historical, and cultural value. Recently, new forms of citizen participation in cultural heritage have emerged, producing a wealth of material spanning from visitors’ experiential feedback on exhibitions and cultural artefacts to digitally mediated interactions like the ones happening on social media platforms. In this talk, I will touch upon the problems of integrating citizen experiences in cultural heritage archives. I argue for good reasons for institutions to archive people’s responses to cultural objects, and then look at the impact that this has on the data infrastructures. I argue that a knowledge organisation system for “data journeys” can help in disentangling problems that include issues of distribution, authoritativeness, interdependence, privacy, and rights management.
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
Slides of the presentation at #ENDORSE2023
The SPARQL Anything project: http://sparql-anything.cc
Endorse Conference 2023, see
https://twitter.com/EULawDataPubs/status/1635663471349223425
--
Abstract:
What should a data integration framework for knowledge graph experts look like?
Approaches can transform the non-RDF data sources by applying ad-hoc transformations to existing ontologies (Any23), using a mapping language (RML) or expanding on existing standards with custom operators (SPARQL Generate). These solutions result either in code that is difficult to maintain and reuse or require KG experts to learn a variety of languages and custom tools. Recent research on Knowledge Graph construction proposes the design of a façade, a notion borrowed from object-oriented software engineering. This idea is applied to SPARQL Anything, a system that allows querying heterogeneous resources as if they were in RDF, in standard SPARQL 1.1.
The SPARQL Anything project supports a wide variety of file formats, from popular ones (CSV, JSON, XML, Spreadsheets) to others that are not supported by alternative solutions (Markdown, YAML, DOCx, Bibtex). Features include querying Web APIs with high flexibility, parametrized queries, and chaining multiple transformations into complex pipelines.
We describe the design rationale of the SPARQL Anything system and its application in two EU-funded projects and in the industry. We provide references to an extensive set of reusable showcases. We report on the value-to-users of the founding assumptions of SPARQL Anything, compared to alternative solutions to knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
Presentation of SPARQL Anything at the MEI Linked Data IG Meeting in July 2021. We try SPARQL Anything with MEI XML files and experiment with simple and difficult tasks.
Challenging knowledge extraction to support the curation of documentary evide...Enrico Daga
The identification and cataloguing of documentary evidence from textual corpora is an important part of empirical research in the humanities. In this position paper, we ponder the applicability of knowledge extraction techniques to support the data acquisition process. Initially, we characterise the task by analysing the end- to-end process occurring in the data curation activity. After that, we examine general knowledge extraction tasks and discuss their relation to the problem at hand. Considering the case of the Listen- ing Experience Database (LED), we perform an empirical analysis focusing on two roles: the listener and the place. The results show, among other things, how the entities are often mentioned many paragraphs away from the evidence text or are not in the source at all. We discuss the challenges emerged from the point of view of scientific knowledge acquisition.
Sciknow - Workshop on Capturing Scientific Knowledge
19 November 2019
Marina del Rey, California, United States
Paper at http://oro.open.ac.uk/67961/
Propagating Data Policies - A User StudyEnrico Daga
When publishing data, data licences are used to specify the actions that are permitted or prohibited, and the duties that target data consumers must comply with. However, in com- plex environments such as a smart city data portal, multiple data sources are constantly being combined, processed and redistributed. In such a scenario, deciding which policies ap- ply to the output of a process based on the licences attached to its input data is a difficult, knowledge-intensive task. In this paper, we evaluate how automatic reasoning upon se- mantic representations of policies and of data flows could support decision making on policy propagation. We report on the results of a user study designed to assess both the accuracy and the utility of such a policy-propagation tool, in comparison to a manual approach.
Propagation of Policies in Rich Data FlowsEnrico Daga
Enrico Daga† Mathieu d’Aquin† Aldo Gangemi‡ Enrico Motta†
† Knowledge Media Institute, The Open University (UK)
‡ Université Paris13 (France) and ISTC-CNR (Italy)
The 8th International Conference on Knowledge Capture (K-CAP 2015)
October 10th, 2015 - Palisades, NY (USA)
http://www.k-cap2015.org/
A bottom up approach for licences classification and selectionEnrico Daga
Presented at the LeDa-SwAn Workshop at ESWC2015
http://cs.unibo.it/ledaswan2015
#ledaswan2015
Licences are a crucial aspect of the information publishing process in the web of (linked) data. Recent work on modeling of policies with semantic web languages (RDF, ODRL) gives the opportunity to formally describe licences and reason upon them. However, choosing the right licence is still challenging. Particularly, understanding the number of features - permissions, prohibitions and obligations - constitute a steep learning process for the data provider, who has to check them individ- ually and compare the licences in order to pick the one that better fits her needs. The objective of the work presented in this paper is to reduce the e↵ort required for licence selection. We argue that an ontology of licences, organized by their relevant features, can help providing support to the user. Developing an ontology with a bottom-up approach based on Formal Concept Analysis, we show how the process of licence selection can be simplified significantly and reduced to answering an average of three/five key questions.
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
Presented at #SALAD2015
The heterogeneity of methods and technologies to publish open data is still an issue to develop distributed systems on the Web. On the one hand, Web APIs, the most popular approach to offer data services, implement REST principles, which focus on addressing loose coupling and interoperability issues. On the other hand, Linked Data, available through SPARQL endpoints, focus on data integration between distributed data sources. We proposes BASIL, an approach to build Web APIs on top of SPARQL endpoints, in order to benefit of the advantages from both Web APIs and Linked Data approaches. Compared to similar solution, BASIL aims on minimising the learning curve for users to promote its adoption. The main feature of BASIL is a simple API that does not introduce new specifications, formalisms and technologies for users that belong to both Web APIs and Linked Data communities.
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
The release of the Data Cube Vocabulary specification introduces a standardised method for publishing statistics following the linked data principles. However, a statistical dataset can be very complex, and so understanding how to get value out of it may be hard. Analysts need the ability to quickly grasp the content of the data to be able to make use of it appropriately. In addition, while remodelling the data, data cube publishers need support to detect bugs and issues in the structure or content of the dataset. There are several aspects of RDF, the Data Cube vocabulary and linked data that can help with these issues however, including that they make the data "self-descriptive". Here, we attempt to answer the question "How feasible is it to use this feature to give an overview of the data in a way that would facilitate debugging and exploration of statistical linked open data?" We present a tool that automatically builds interactive facets as diagrams out of a Data Cube representation without prior knowledge of the data content to be used for debugging and early analysis. We show how this tool can be used on a large, complex dataset and we discuss the potential of this approach.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Linked data for knowledge curation in humanities research
1. Linked data for knowledge curation in
humanities research
Enrico Daga
Research Fellow, Knowledge Media Institute, The Open University
14th January 2020, Lancaster University / History Dept.
enrico.daga@open.ac.uk - @enridaga
4. Invented the web in 1989
(yeah!)
Invented the semantic web
in 1994 (duh?)
5. “To a computer, then, the web is a flat, boring
world devoid of meaning”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
6. “This is a pity, as in fact documents on the web
describe real objects and imaginary concepts,
and give particular relationships between them”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
7. “Adding semantics to the web involves two things:
allowing documents which have information in
machine-readable forms, and allowing links to be
created with relationship values.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
8. “The Semantic Web is not a separate Web but an
extension of the current one, in which information is
given well-defined meaning, better enabling
computers and people to work in cooperation.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
11. This did not come out of the blue
World’s academic communities has been dealing for years with knowledge
representation
Artificial intelligence, natural language processing, model management,
and many other research fields largely contributed
Some ancestors traced the way …
12. EXAMPLE
• Instances are associated with one or several classes:
Boddingtons rdf:type Ale .
Grafentrunk rdf:type Bock .
Hoegaarden rdf:type White .
Jever rdf:type Pilsner .
Ale rdfs:subClassOf TopFermentedBeer .
White rdfs:subClassOf TopFermentedBeer .
TopFermentedBeer rdfs:subClassOf Beer .
Bock rdfs:subClassOf BottomFermentedBeer .
rdfs:subClassOf rdf:type owl:TransitiveProperty .
15. Ontologies, different types of
Domain independent: SKOS, OWL, Prov, Time, …
Foundational, general purpose:
• DOLCE, SUMO (“Upper Ontologies”)
• CIDOC-CRM: broad scope, targets “cultural heritage” in general
Pragmatic, community-oriented:
• Dublin Core Metadata Initiative
• Google’s schema.org
• https://linked.art/
• Humanities forums: LinkedPasts series, WHiSe Workshops
https://lov.linkeddata.es/dataset/lov
16. Linked Data in a nutshell
hCps://en.wikipedia.org/wiki/Linked_data
Linked Data is a way of publishing structured information that allows data
to be connected and enriched by means of links among their entities.
• LD uses the World Wide Web as publishing platform
• LD is based on basic Web standards (URIs, HTTPs, RDF)
• open to everyone
• LD enables the adoption of shared schemas (Ontologies)
• LD makes the data self-explanatory and self-documented
• LD enables your data to refer to other data
• … and other data to refer to yours!
17. Linked Open Data Cloud in 2007
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
18. Linked Open Data Cloud in 2010
2010 - The OU launches the data.open.ac.uk
Linked Open Data portal, the first of its kind in the UK
19. The OU Open Knowledge Graph
http://data.open.ac.uk
20. Linked Open Data Cloud in 2014
Crawlable
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
24. How this wealth of data can support the
retrieval of documentary evidence?
The identification and cataloguing of documentary evidence from textual
corpora is an important part of empirical research based on
historiographical methodology.
25. The Listening Experience Database
• An open and freely searchable database
that brings together a mass of data
about people’s experiences of listening
to music of all kinds, in any historical
period and any culture.
• Sophisticated data model, natively in RDF
• Linked Open Data:
http://data.open.ac.uk/context/led
• Since 2012, the LED project has collected
over 10,000 unique listening experiences
from a variety of textual sources
https://led.kmi.open.ac.uk/
26. Problem: humanists coin new concepts!
• Traditional AI research is focused on common sense notions
• keyword & topic based information retrieval (documents related to
“Science” or “Music”)
• events as declared statements (e.g. U.S. based attacked by Iran missiles)
• Problem: humanities databases are built on novel concepts, e.g.
• Listening experience (LED Project)
• Reading experience (EU funded READ-IT project)
• Sitting Experience (DH/Arts History PhD at the OU)
27. Manual workflow
Problems: the activity (a) requires effort / time, (b) is not systematic, (c) is
prone to errors, and (d) the methodology is (often) not documented
How to help scholars on finding a piece of evidence in a text?
28. How to detect concepts beyond keywords?
We coin the expression themed evidence, to refer to (direct or indirect) traces of a
fact or situation relevant to a theme of interest and study the problem of
identifying them in texts.
The task of identifying themed evidence is at the intersection between topical text
classification (finding texts relevant to a certain theme) and event retrieval (find
events mentioned in texts).
Not all topical texts are themed evidence and the nature of the event itself is often
assumed, implicit, and left to the reader
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
29. Finding Listening Experiences (theme: music)
• RECMUS-619, positive: Introduced to the Anacreontic Society, consisting of
amateurs who perform admirably the best orchestral works. The usual supper
followed. After propitiating me with a trio from ’Cosi Fan Tutte’, they drew me to
the piano.
• MASONB-31, positive: In the evening we went to Rev. Baptist Noel’s chapel,
where one is always sure of edification from the sermon if not from the psalms.
• MASONB-88, negative: Flags and pendants were suspended from the windows,
[. . . ] the colors of the German States were waving harmoniously together, and
the banners of the Fine Arts, with appropriate inscriptions, particularly those of
music, poetry and painting, were especially honored, and floated triumphant
amidst the standards of electorates, dukedoms, and kingdoms.
30. A Hybrid Approach
• Themed evidence are a subset of topical texts (e.g. about “music”) - distributional semantics
• Common knowledge graphs include a large amounts of interlinked entities, including topical
entities (in the category “music”) - entity linking to structured knowledge
• Background knowledge can be used for learning features and tuning elements of the method -
corpus based analysis
• LE Database includes text excerpts that can be analysed as positive examples.
• Project Gutenberg >58k books in the public domain (48790 en)
• DBpedia is a large knowledge graph published as Linked Data. Includes SPARQL endpoint and a
NER tool: DBpedia Spotlight
• We formalise the task as a binary classification problem; approach in three steps:
1. Statistical relatedness analysis -> From a Key Terms (e.g. “Music”)
2. Themed-entity detection -> About a key subject (e.g. dbpedia:Music)
3. Hybridisation phase
32. Statistical relatedness // Example
RECMUS-619, positive: Introduced to the Anacreontic Society,
consisting of amateurs who perform admirably the best
orchestral works. The usual supper followed. After propitiating me
with a trio from 'Cosi Fan Tutte', they drew me to the piano.
• Anacreontic[n]: 4.13048797627
• amateur[n]: 4.60138704262
• admirably[r]: 3.65226351076
• orchestral[j]: 7.09262661606
• trio[n]: 5.60459207257
• piano[n]: 6.36957273307
Correct
33. Statistical relatedness // Example
MASONB-31, positive: In the evening we went to Rev. Baptist
Noel's chapel, where one is always sure of edification from the
sermon if not from the psalms.
psalm[n]: 4.05596201177
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Wrong
34. Statistical relatedness // Example
MASONB-88, negative: Flags and pendants were suspended from the windows,
[...] the colours of the German States were waving harmoniously together, and
the banners of the Fine Arts, with appropriate inscriptions, particularly those of
music, poetry and painting, were especially honored, and ︎oated triumphant
amidst the standards of electorates, dukedoms, and kingdoms.
harmoniously[r]:4.96754289705
music[n]:1.0
poetry[n]:5.93071678171
painting[n]:4.39244380382
triumphant[j]:3.80869437369
amidst[i]:3.6638322575
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Wrong
35. 2> Themed entity detection
• DBPedia Spotlight to identify %entities%
• SPARQL query to filter the ones related to
dbcat:Music
• Where %entities% are the resources identified by
the NER engine, and %d% is a parameter, set to 5
(>5 too much noise).
SELECT distinct ?sub WHERE {
VALUES ?sub { %entities% }
?sub dc:subject ?subject .
?subject skos:broader{0:%d%} cat:Music
}
36. 3> Hybridisation
Entity boost. To
promote terms mapped
to entities
PoS Filter: demote
terms other then verbs
and nouns, to privilege
factual statements
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
37. Hybrid Approach // Example
RECMUS-619, positive: Introduced to the Anacreontic Society,
consisting of amateurs who perform admirably the best
orchestral works. The usual supper followed. After propitiating me
with a trio from 'Cosi Fan Tutte', they drew me to the piano.
http://dbpedia.org/resource/Anacreontic_Society
http://dbpedia.org/resource/Orchestra
http://dbpedia.org/resource/Trio_(music)
http://dbpedia.org/resource/Così_fan_tutte
http://dbpedia.org/resource/Piano Correct
38. Hybrid Approach // Example
MASONB-31, positive: In the evening we went to Rev. Baptist
Noel's chapel, where one is always sure of edification from the
sermon if not from the psalms.
http://dbpedia.org/resource/
Evening_Prayer_(Anglican)
http://dbpedia.org/resource/Psalms
Daga, Enrico, and Enrico Motta. "Capturing themed evidence, a hybrid approach." In Proceedings of the 10th
International Conference on Knowledge Capture, pp. 93-100. 2019.
Correct
39. Hybrid Approach // Example
MASONB-88, negative: Flags and pendants were suspended from the windows, [...]
the colours of the German States were waving harmoniously together, and the
banners of the Fine Arts, with appropriate inscriptions, particularly those of music,
poetry and painting, were especially honored, and ︎oated triumphant amidst the
standards of electorates, dukedoms, and kingdoms.
http://dbpedia.org/resource/Music
Correct
41. What about supporting curation?
How to support users in cataloguing the documentary evidence?
How to detect the entities and their relationships in the sources?
How to automatically populate the database with metadata?
42.
43. Knowledge Extraction (KE)
• Bet: metadata curation could be supported with KE methods
• KE: automatic or semi-automatic derivation of formal symbolic knowledge from
unstructured or semi-structured sources
• Approaches in the literature vary in task / scope:
• (Named) Entity Recognition and Classification (Person, Work, Time, Place,
…)
• Entity Linking (DBpedia, Gazetteers)
• Relation Extraction (listener of, in place)
• Event extraction (Performance)
• Machine reading
44. Example #1
"I then went to Amsterdam to conduct Oedipus at the
Concertgebouw, which was celebrating its fortieth
anniversary by a series of sumptuous musical
productions. The fine Concertgebouw orchestra, always
at the same high level, the magnificent male choruses
from the Royal Apollo Society, soloists of the first rank -
among them Mme Hélène Sadoven as Jocasta, Louis van
Tulder as Oedipus, and Paul Huf, an excellent reader -
and the way in which my work was received by the public,
have left a particularly precious memory that I recall with
much enjoyment."
listener: Igor Strawinsky
time: in the beginning of 1928
place: Amsterdam
opera: Oedipus Rex
/by: Igor Strawinsky
performer: Concertgebouw orch.
environment: Public
Igor Stravinksy
An Autobiography (1936), p. 139.
https://led.kmi.open.ac.uk/entity/lexp/1435674909834
45. Example #2
"Music is certainly a pleasure that may be
reckoned intellectual, and we shall never
again have it in the perfection it is this
year, because Mr. Handel will not
compose any more! Oratorios begin next
week, to my great joy, for they are the
highest entertainment to me."
listener: Mrs Delany
time: March, 1737
place: London
opera: Operas and Oratorios
/by: G. F. Handel
environment: Public
From: Mary Granville, and Augusta Hall (ed.),
Autobiography and Correspondence of Mary Granville, Mrs
Delany: with interesting Reminiscences of King George the
Third and Queen Charlotte, volume 1 (London, 1861), p.
594.
https://led.kmi.open.ac.uk/entity/lexp/1444424772006
Feedback: @enridaga | www.enridaga.net
46. Analysis: detect the Listener & Place of a LE
• Q1 - in the excerpt? The place is mentioned in the
excerpt in 25.9% cases. The listener only in 13.4%.
• Q2 - near the excerpt? Only 10% of the times the
place mention is less than 5 paragraphs from the
excerpt. The agent, in 4% of the cases.
• Q3 - in the source? 83.2% of the times the place is
mentioned at least once in the source. In 11.4%
the place hasn’t been found.
• Q4 - in the meta? 64.8% of the listeners are also
the authors of the text - 5874 cases in LED.
Distance of entity (in n of paragraphs)
47. Open problems
• Implicit information, based on inference requiring expertise (e.g. Mr
Handel is G.F Handel, Oedipus is “Oedipus Rex”)
• The role of contextual knowledge is fundamental (1) in identifying
the agent from the metadata of the source; (2) common sense
inference (“in the beginning of 1928”)
• Entities can exist in distributed, heterogeneous resources
(encyclopaedic KBs, domain-specific taxonomies, gazetteers, …)
• Cultural studies typically coin novel concepts (ListeningExperience)
with original schemas. Portability of the methods is even more at risk!
Daga, E and Motta, E. "Challenging knowledge extraction to support the curation of documentary evidence in the humanities." (2019).
48. Summary
• Linked Data transforms the way information is shared on the Web
• but also enable opportunities to apply AI techniques to more
applications domains
• supporting users in finding and curating documentary evidence is an
important and difficult task
• finding complex concepts in texts is (more) possible then before,
although most of these techniques have not been applied at scale yet
• traditional AI research is challenged by the richness and diversity of use
cases in the humanities, especially considering the knowledge extraction
49. WHiSe 3
Call for papers!
3rd Workshop on Humanities in
the Semantic Web (WHiSe)
Co-located with the 15th Extended
Semantic Web Conference (ESWC 2020)
Heraklion, Crete, Greece
31/05 or 31/06, 2020 (TBD)
Submission deadline:
28th February
http://whise.cc/
https://commons.wikimedia.org/wiki/File:Edward_Burne-Jones_-_Tile_Design_-
_Theseus_and_the_Minotaur_in_the_Labyrinth_-_Google_Art_Project.jpg
50. PhD position open soon
Title: “Distributed Linked Data for Cultural Heritage”
The aim of this project is to research and develop distributed, Linked Data systems
that enable cultural content to be shared between museums and the public. This may
include innovative ways of publishing digital artworks and related resources by
memory institutions as well as enabling the public to share their own experiences of
visiting and engaging with cultural heritage. The PhD will benefit from being closely
connected with the EU funded SPICE project [1] which is developing methods and
tools to allow citizen groups to actively participate with museums internationally.
[1] http://kmi.open.ac.uk/projects/name/spice