With the transition of the Web of today from an information repository to a suite of services, the demand for machine-readable data to support the latter is now greater than ever. The social and, more generally, community element is proving to be a valuable medium to convey such a bulk of knowledge. Linked Data is a leading body of standards for publishing and using open knowledge bases on the Web, however, it very much relies upon the notion of identity. Every object of the world being described should be uniquely identified in order to be effectively manipulated. Music is a specially provocative domain of interest for such Web knowledge bases, being a topic where most people feel confident they can contribute to, yet with varying degrees of factual knowledge, personal inclination or scholarly rigour. Curating a dataset that covers an aspect new to this landscape, as is the evidence of listening experiences, means dealing with partial, inexplicit or underspecified information. A likely implication is that several elements of a listening experience, such as the listeners, the time in history or the music being heard, can be described to an extent but not identified, thus in stark contrast with a founding principle of Linked Data. This talk will illustrate the nature of the main elements of fuzzy knowledge that emerged from the contributions to the Listening Experience Database, elaborate on the countermeasures adopted and lessons learnt from the life-cycle of LED data, and assess the state of maturity of Linked Data technologies for accommodating such use-cases.
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Martin Kalfatovic
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a yeoman's miscellany, and nonesuch guide to Linked Data, especially as it relates to libraries, archives, and museums. Martin R. Kalfatovic. American Library Association Annual Meeting. Anaheim, CA. 23 June 2012.
Making materials findable at State Library Victoria, May 2015Alan Manifold
A brief overview of some of the complications related to making a multitude of materials available in many ways at a single institution. Presented as a guest lecture to the RMIT ISYS1168 Document Management 2 class, 26 May 2015.
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...Sarah Shreeves
Presentation for Panel on the Dynamics of Sharing: An Introduction to Shareable Metadata and Interoperability. Annual Conference of the Society for American Archivists. Chicago, Il. August 31, 2007.
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Martin Kalfatovic
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a yeoman's miscellany, and nonesuch guide to Linked Data, especially as it relates to libraries, archives, and museums. Martin R. Kalfatovic. American Library Association Annual Meeting. Anaheim, CA. 23 June 2012.
Making materials findable at State Library Victoria, May 2015Alan Manifold
A brief overview of some of the complications related to making a multitude of materials available in many ways at a single institution. Presented as a guest lecture to the RMIT ISYS1168 Document Management 2 class, 26 May 2015.
The Dynamics of Sharing: An Introduction to Shareable Metadata and Interopera...Sarah Shreeves
Presentation for Panel on the Dynamics of Sharing: An Introduction to Shareable Metadata and Interoperability. Annual Conference of the Society for American Archivists. Chicago, Il. August 31, 2007.
Current metadata landscape in the library world Getaneh AlemuGetaneh Alemu
This workshop was presented at MTSR-2017 (Nov. 27, 2017) in Tallinn, Estonia http://www.mtsr-conf.org/index.php/programme The workshop aims to bring the current metadata landscape in libraries in context, with particular emphasis on emerging theory/principles and best practices covering:
• The theory of enriching and filtering
• Metadata enriching through RDA (Hands on - The RDA Toolkit and implementation of RDA at Southampton Solent University)
• Metadata filtering through FRBR (practical issues that cataloguers face in FRBRising their catalogue)
• Metadata management (metadata quality, authority control and subject headings)
• Metadata systems, tools and applications (practical issues of e-books and database cataloguing)
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a yeoman's miscellany, and nonesuch guide to Linked Data, especially as it relates to libraries, archives, and museums. American Association of Museums Meeting. Minneapolis, MN. 2 May 2012.
A presentation by Susanne Thorbord, Bibliographic Consultant at the Danish Bibliographic Centre (DBC).
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
A presentation at the workshop "Rich and loonely or poor and popular?" at the Dublin Core conference in Lisbon on September 4th, 2013. The main hypothesis is that when publishing (linked) data, the main criteria should not be richness and poorness, but suitability for purpose, granularity and adherence to agreed-on models.
Border Trouble: On the Frontiers of Digital ScholarshipSpencer Keralis
Fourth Texas-Jalisco Conference in Education and Culture, University of North Texas
Panel: New Frontiers for Research, Teaching and Learning: Digital Scholarship and Latin@ Archives/Nuevas Fuentes para Investigación, Enseñanza and Aprendizaje: Estudios Digitales y Archivos Latin@s
Richard deswarte interrogating the archived uk webDigital History
Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/
An introduction to XML and explanation of how it may be used to encode qualitative data produced by health researchers. Talk given by Libby Bishop of the UK Data Service at the Data Management in Practice workshop, which took place on Nov 14th 2013 at the London School of Hygiene and Tropical Medicine
Past forest and land fires in SEA: What did we learn?CIFOR-ICRAF
This presentation by CIFOR scientist Daniel Murdiyarso talks about the haze problem in South East Asia (SEA). He focuses on the El-Nino Southern Oscillation, if the fires are a problem, the history of fires in SEA, what different impacts the fires can have and how fires and haze should be addressed.
Current metadata landscape in the library world Getaneh AlemuGetaneh Alemu
This workshop was presented at MTSR-2017 (Nov. 27, 2017) in Tallinn, Estonia http://www.mtsr-conf.org/index.php/programme The workshop aims to bring the current metadata landscape in libraries in context, with particular emphasis on emerging theory/principles and best practices covering:
• The theory of enriching and filtering
• Metadata enriching through RDA (Hands on - The RDA Toolkit and implementation of RDA at Southampton Solent University)
• Metadata filtering through FRBR (practical issues that cataloguers face in FRBRising their catalogue)
• Metadata management (metadata quality, authority control and subject headings)
• Metadata systems, tools and applications (practical issues of e-books and database cataloguing)
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a yeoman's miscellany, and nonesuch guide to Linked Data, especially as it relates to libraries, archives, and museums. American Association of Museums Meeting. Minneapolis, MN. 2 May 2012.
A presentation by Susanne Thorbord, Bibliographic Consultant at the Danish Bibliographic Centre (DBC).
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
A presentation at the workshop "Rich and loonely or poor and popular?" at the Dublin Core conference in Lisbon on September 4th, 2013. The main hypothesis is that when publishing (linked) data, the main criteria should not be richness and poorness, but suitability for purpose, granularity and adherence to agreed-on models.
Border Trouble: On the Frontiers of Digital ScholarshipSpencer Keralis
Fourth Texas-Jalisco Conference in Education and Culture, University of North Texas
Panel: New Frontiers for Research, Teaching and Learning: Digital Scholarship and Latin@ Archives/Nuevas Fuentes para Investigación, Enseñanza and Aprendizaje: Estudios Digitales y Archivos Latin@s
Richard deswarte interrogating the archived uk webDigital History
Digital History seminar
4 November 2014
Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/
An introduction to XML and explanation of how it may be used to encode qualitative data produced by health researchers. Talk given by Libby Bishop of the UK Data Service at the Data Management in Practice workshop, which took place on Nov 14th 2013 at the London School of Hygiene and Tropical Medicine
Past forest and land fires in SEA: What did we learn?CIFOR-ICRAF
This presentation by CIFOR scientist Daniel Murdiyarso talks about the haze problem in South East Asia (SEA). He focuses on the El-Nino Southern Oscillation, if the fires are a problem, the history of fires in SEA, what different impacts the fires can have and how fires and haze should be addressed.
This presentation informs about the factors which are important when considering future haze research, like how emissions take place, goals of the research, the gaps left by previous research, the socioeconomic drivers and governance arrangements. It was presented at a multi-stakeholder workshop held in Jakarta on 29 January, 2014 to discuss areas of research into the haze crisis
This presentation by CIFOR scientist Herry Purnomo held at The Jakarta Foreign Correspondents Club during a panel discussion on the Indonesian fires and haze focuses on the stakeholders involved in the haze issue, socio-economic drivers leading to it, policy and governance and the way forward with the landscape approach in connection to haze.
Burning issues: Global and local effects of indonesian hazeCIFOR-ICRAF
This presentation was delivered by CIFOR Director General Peter Holmgren to the Indonesian Heritage Society in February 2016.
The topics discussed include the history, causes, and eventual solutions to Indonesia's fire hotspots.
Linked data for knowledge curation in humanities researchEnrico Daga
The identification and cataloguing of documentary evidence is an important part of empirical research in the humanities.
An increasing number of recent initiatives in the digital humanities have as a primary objective the curation of collections of digital artefacts augmented with fine-grained metadata, for example, mentioning the entities and their relations, often adopting the "Linked Data" paradigm. This talk is focused on exploring the potential of Linked Data to support humanities scholars in identifying, collecting, and curating documentary evidence. First, I will introduce the basic notions around Linked Data and place its emergence in the tradition of Knowledge Representation, an area of Artificial Intelligence (AI). Second, I will show how Linked Data and AI techniques have been successfully applied in the Listening Experience Database project to support the retrieval and curation of documentary evidence. Finally, I will conclude the presentation by discussing the potential (and challenges) of adopting a "knowledge extraction" paradigm to automate the identification and cataloguing of metadata about documentary evidence in texts.
Presentation and latest updates on the The Listening Experience Database at the 2016 EDPOP workshop (The European Dimensions of Popular Print Culture).
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...Alba Morales
Knowledge Graphs (KGs) have emerged as a valuable tool for supporting humanities scholars and cultural heritage organisations. In this resource paper, we present the Musical Meetups Knowledge Graph (MMKG), a collection of evidence of historical collaborations between personalities relevant to the music history domain. We illustrate how we built the KG with a hybrid methodology that, combining knowledge engineering with natural language processing, including the use of Large Language Models (LLM), machine learning, and other techniques, identifies the constituent elements of a historical meetup. MMKG is a network of historical meetups extracted from ∼33k biographies collected from Wikipedia focused on European musical culture between 1800 and 1945. We discuss how, by providing a structured representation of social interactions, MMKG supports digital humanities applications and music historians’ research, teaching, and learning.
Opening up and linking data is becoming a priority for many data producers because of institutional requirements, or to consume data in newer applications, or simply to keep pace with current development. Since 2014, this priority has gaining momentum with the Global Open Data in Agriculture and Nutrition initiative (GODAN). However, typical small and medium-size institutions have to deal with constrained resources, which often hamper their possibilities for making their data publicly available. This webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data World.
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
Digital History seminar
29 September 2015
Ian Milligan (University of Waterloo)
http://ihrdighist.blogs.sas.ac.uk/2015/09/01/tuesday-29-september-2015-ian-milligan-the-challenge-of-digital-sources-in-the-web-age-common-tensions-across-three-web-histories-1994-2015/
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
Presentation given at the International Data Curation Conference (#IDCC!6) in Amsterdam, at the "A Context-driven Approach to Data Curation for Reuse" workshop (organized by Ixchel Faniel and Elizabeth Yakel) on Monday, February 22, 2015
Providing open data is of interest for its societal and commercial value, for transparency, and because more people can do fun things with data. There is a growing number of initiatives to provide open data, from, for example, the UK government and the World Bank. However, much of this data is provided in formats such as Excel files, or even PDF files. This raises the question of
- How best to provide access to data so it can be most easily reused?
- How to enable the discovery of relevant data within the multitude of available data sets?
- How to enable applications to integrate data from large numbers of formerly unknown data sources?
One way to address these issues to to use the design principles of linked data (http://www.w3.org/DesignIssues/LinkedData.html), which suggest best practices for how to publish and connect structured data on the Web. This presentation gives an overview of linked data technologies (such as RDF and SPARQL), examples of how they can be used, as well as some starting points for people who want to provide and use linked data.
The presentation was given on August 8, at the Hacknight event (http://hacknight.se/) of Forskningsavdelningen (http://forskningsavd.se/) (Swedish: “Research Department”) a hackerspace in Malmö.
PATHS at the Language Technology Group, Computer Science and Software Enginee...pathsproject
Presentation given by Mark Stevenson, University of Sheffield, at the Language Technology Group, Computer Science and Software Engineering Department, Melbourne University.
A 1015 update to the 2012 "Data Big and Broad" talk - http://www.slideshare.net/jahendler/data-big-and-broad-oxford-2012 - extends coverage, brings more in context of recent "big data" work.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Linkage in Haze: challenges and take-home messages of crowd-sourcing vagueness in musical data
1. Linkage in Haze
Challenges and take-home messages of crowd-sourcing
vagueness in musical data
Alessandro Adamou
Listening to music: people, practices and experiences
Sunday, October 25, 2015
4. A well-formed listening experience
On July 19, 2014
Leonie Holmes (a professor of Music in New Zealand)*
was listening to Johann Strauss’ “Don Juan”
and “Also Spracht Zarathustra”
and Sarah Ballard’s “Synergos”
played by the NZSO National Youth Orchestra
and Alexander Shelley
using harp and double bass (+others?)
in the Aotea Centre.
(*) plus a generic public, which does not pose a problem in the representation.
5. A worst-case (but more likely)
listening experience
One evening between May and September
in the late 1950’s
a group of war veterans, a reporter and an
unknown female
were listening to Chopin and an anthem
played by a string orchestra
in a concert hall in South London.
6. Goal: to capture both fact sets
as structured data
(interconnected or prepared for
refinement)
7. Other issues
• Source
– unpublished manuscripts
• Unaligned semantic layers
– instrument category | instrument name | brand and model
– generic occupations | gender-dependent | personal titles
Monarch
King (Queen), Emperor (Empress)…
King of England, fourth Sultan of Zanzibar…
Chords
Electric Guitar
Gibson Les Paul Custom Sunburst
8. Factors contributing to fuzziness
• Domain knowledge of the evidence author(s)
• Deterioration of evidence
• Crowd-sourcing community issues:
– Misaligned semantics
– Varying scholarly rigour
– Popularity of the domain of interest
9. Data representation in LED
• Linked [Open] Data http://linkeddata.org
– Formalism for machine-readable and human-
readable data
– Object identifiers are URIs
– Standard representation and query languages
(RDF, SPARQL…)
– The meaning of links between objects is
globally understood.
10. Identity in Linked Data
• http://musicbrainz.org/artist/f5aca88c-e3c1-4bc2-af33-
68a9a9f7b56a#_
– (the band Killing Joke, as in MusicBrainz)
• http://bnb.data.bl.uk/id/agent/DailyMirror
– (The Daily Mirror on the British National Bibliography)
• http://dbpedia.org/resource/London
– (London, as in Wikipedia/DBpedia)
• http://reference.data.gov.uk/doc/day/2015-10-23
– (last Friday, as in the UK Government Calendar data)
• http://led.kmi.open.ac.uk/term/Medium.Live
– (the concept of live music, as in LED)
11. Identity in Linked Data
• http://musicbrainz.org/artist/f5aca88c-e3c1-4bc2-af33-
68a9a9f7b56a#_
– (the band Killing Joke, as in MusicBrainz)
• http://bnb.data.bl.uk/id/agent/DailyMirror
– (The Daily Mirror on the British National Bibliography)
• http://dbpedia.org/resource/London
– (London, as in Wikipedia/DBpedia)
• http://reference.data.gov.uk/doc/day/2015-10-23
– (last Friday, as in the UK Government Calendar data)
• http://led.kmi.open.ac.uk/term/Medium.Live
– (the concept of live music, as in LED)
Easy: these are all named entities…
12. Goal: to capture both fact sets
as linked data
There are no right or wrong
ways to do it, only linkable or
unlinkable.
13. Linked Data encourage reuse…
No two things are distinct nor equal, until
some LD node asserts or implies otherwise.
– e.g. Bono on MusicBrainz and Bono on DBpedia
– Groups too, if it can be demonstrated they are an
exact match
14. …but fuzzy concepts have caveats
Group entity “Mourners of Felix Mendelssohn” (attending
the arrival of his body in Berlin)
See http://led.kmi.open.ac.uk/entity/lexp/1434029100189
• Identifier of the group is
http://data.open.ac.uk/led/agent/Mourners+of+Felix+Mendelssohn/1434029100190
should not be reused when modelling an entry about
Mendelssohn’s funeral service.
See http://led.kmi.open.ac.uk/entity/lexp/1434029387526
– Identifier of the group is
http://data.open.ac.uk/led/person/Mourners+at+the+Funeral+Service+of+Felix+Mende
lssohn/1434029247629
15. Blank nodes
• Fallback mechanism for providing data about objects
without having a naming convention for them.
• Reference something not by name, but by description.
• Example:
:performance/Messiah/12345 mo:listener [
a foaf:Group ;
dc:description “Foreign ambassadors”
:occupation dbpedia:Ambassador
]
• Generally not an advisable solution:
– Cannot perform matching on blank nodes
– Querying or detecting changes in the data is much harder
16. Ontological classes
• Model vague objects as formally-specified categories rather than named
entities
• e.g. “the class of all people whose occupation is Ambassador and who were
at the Royal Albert Hall on May 12, 1876”
• Pros:
– Allows separation of “known” and “generic” entities
– Semantically cleaner and easier to store and manage
• Cons:
– Still need to make URIs for each class
– They have to be instantiated before they can be used in a listening
experience
– Harder to apply changes to the data without fixed classes
17. Countermeasures in LED
• No blank nodes
• For unaligned semantic layers (cf.
example on instruments and
occupations):
–Use lax model properties
–Enforce reuse of external taxonomies
• ‘rich’ real-time recommendations
18. Countermeasures in LED
Data reconciliation
Currently with restricted
access, but plans to open
to crowd-sourcing
19. Countermeasures in LED
• Ad-hoc formal models for underspecified data.
• Example: Extended Date/Time Format (standard draft, Library of
Congress, 2012)
– Allows formalisation of underspecified points in time and intervals, e.g.
“187u-05-uu”
– We extended it to support subjective fuzzy intervals (e.g. early/mid/late)
and ranges (from-to)
– Made available in RDF through data.open.ac.uk
• Example 2: GeoSPARQL
– Used to support geospatial queries in Linked Data
– Named entity recognition on arbitrary text for locations (recently)
– We compute location URIs by hashing their descriptions and all the
locations extracted from it and related via geosparql:sfIntersects
20. How thick is the mist in LED?
Named Vague Total
Participants 802 260 1062
Locations 136 15
(cannot pinpoint)
151*
Times 826 843
(ranges, not qualified)
1669
Musical works 1550 1263 2813
(*) since database opened to arbitrary experience locations
Figures for LED public dataset
21. Lessons learnt
• Advantages
– Open-world semantics: minimise risk of ambiguities
generated by name clashes, allows for coherent management
– Monotonic: data are refined by addition of facts
– Can be reasoned upon by machine-learning agents working
on the native data structure
– Incorporates reuse for the benefit of the whole data cloud.
• Disadvantages
– No reuse entails heavy replication
– Data cleansing may require a large context for detecting entities
that can be reconciled
22. Lessons learnt
• Most, if not all representational issues with vagueness can be
addressed in LD without resorting to blank nodes and safe from
ambiguity.
– Way more powerful that traditional database systems.
• Data providers are yet to reach an albeit silent agreement on:
– representational paradigms for entities commonly at risk of
underspecification, such as spatio-temporal ones;
– how to name their objects.
• Most are making it easy for themselves when it comes to LD
• The way to go is de facto standards
23. Where to go next
• Model ontological classes as their instances
(equivalence classes?)
• Increase context for fact-based data alignment
(opening reconciliation facilities to the public –
with voting?)
• Argumentation on every statement in LED
• Dissemination of controlled vocabularies and
naming convention for managed vague entities.
24. Are Linked Data mature for
representing vagueness?
• The technology is.
• The data out there aren’t.
– (but that is the part that can be improved)
25. Further reading
• Eero Hyvönen, Publishing and Using Cultural Heritage Linked
Data on the Semantic Web (Morgan & Claypool, 2012)
• Daniel J. Lewis and Trevor P. Martin, Managing Vagueness with
Fuzzy in Hierarchical Big Data. In 2015 INNS Conference on Big
Data (Elsevier, 2015), Procedia Computer Science, Vol. 53, p. 19-28
• Fuzzy Logic and the Semantic Web, Elie Sanchez (ed.) (Elsevier,
2006)