Beyond the Record : OCLC & the Future of MARCtfons
A review of OCLC's contribution to RDA, OCLC's infrastructure related to the MARC standard and thoughts on moving beyond the emphasis on cataloging records to the description, discovery and exposure of work-level views of bibliographic metadata.
Streaming Day: an overview of Stream Reasoning
Logical reasoning in real time on multiple, heterogeneous, gigantic and inevitably noisy data streams in order to support the decision process of extremely large numbers of concurrent users.
-- S. Ceri, E. Della Valle, F. van Harmelen and H. Stuckenschmidt, 2010
A brief history of the RDF4J Project and an overview of tools and code examples that demonstrate how to work with it in your applications.
Slides accompanying the Lotico Webinar event on May 14, 2020 - see http://www.lotico.com/index.php/Eclipse_RDF4J_-_Working_with_RDF_in_Java
Beyond the Record : OCLC & the Future of MARCtfons
A review of OCLC's contribution to RDA, OCLC's infrastructure related to the MARC standard and thoughts on moving beyond the emphasis on cataloging records to the description, discovery and exposure of work-level views of bibliographic metadata.
Streaming Day: an overview of Stream Reasoning
Logical reasoning in real time on multiple, heterogeneous, gigantic and inevitably noisy data streams in order to support the decision process of extremely large numbers of concurrent users.
-- S. Ceri, E. Della Valle, F. van Harmelen and H. Stuckenschmidt, 2010
A brief history of the RDF4J Project and an overview of tools and code examples that demonstrate how to work with it in your applications.
Slides accompanying the Lotico Webinar event on May 14, 2020 - see http://www.lotico.com/index.php/Eclipse_RDF4J_-_Working_with_RDF_in_Java
An exploration of a possible pipeline for RDF datasets from Timbuctoo instances to the digital archive EASY.
- Get, verify, ingest archive and disseminate (linked) data and metadata.
- What are the implications for an archive: serving linked data over (longer periods of) time
- Practical stuff.
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
Presentation by Tony Hammond and Michele Pasin to Linked Science workshop, co-located with International Semantic Web Conference (ISWC) 2015, on October 12, 2015
Environment Canada's Data Management ServiceSafe Software
A brief history in TimeSeries data at Environment Canada. An Enterprise view of how FME can be integrated into departmental data management activities.
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Quickly re-publish CSV/TSV files from existing repositories as FAIR Data with just a few mouse clicks!
You select the columns to "project" as Linked Data, and the associated ontology terms. The FAIR Projector Builder will create a FAIR Projector for you: a Triple Pattern Fragment server to provide the Linked Data; a published DCAT Distribution containing metadata about those triples and their source; and an RML model (syntactic and semantic of the triples, to aid in third-party discovery of this novel projection.
(current status - first prototype, not ready for public consumption)
-------
Thanks to the NBDC/DBCLS for sponsoring the hackathon series.
MDW also funded by Ministerio de Economía y Competitividad grant number TIN2014-55993-RM
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
An exploration of a possible pipeline for RDF datasets from Timbuctoo instances to the digital archive EASY.
- Get, verify, ingest archive and disseminate (linked) data and metadata.
- What are the implications for an archive: serving linked data over (longer periods of) time
- Practical stuff.
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
Presentation of the tool LODeX (http://www.dbgroup.unimore.it/lodex2/testCluster) at the 2015 IEEE/WIC/ACM International Conference on Web Intelligence, Singapore, December 6-8, 2015
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
Presentation by Tony Hammond and Michele Pasin to Linked Science workshop, co-located with International Semantic Web Conference (ISWC) 2015, on October 12, 2015
Environment Canada's Data Management ServiceSafe Software
A brief history in TimeSeries data at Environment Canada. An Enterprise view of how FME can be integrated into departmental data management activities.
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...Vladimir Alexiev, PhD, PMP
Vladimir Alexiev, Dimitar Manov, Jana Parvanova and Svetoslav Petrov. In proceedings of workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013) at TPDL 2013, 26 Sep 2013, Valetta, Malta
Quickly re-publish CSV/TSV files from existing repositories as FAIR Data with just a few mouse clicks!
You select the columns to "project" as Linked Data, and the associated ontology terms. The FAIR Projector Builder will create a FAIR Projector for you: a Triple Pattern Fragment server to provide the Linked Data; a published DCAT Distribution containing metadata about those triples and their source; and an RML model (syntactic and semantic of the triples, to aid in third-party discovery of this novel projection.
(current status - first prototype, not ready for public consumption)
-------
Thanks to the NBDC/DBCLS for sponsoring the hackathon series.
MDW also funded by Ministerio de Economía y Competitividad grant number TIN2014-55993-RM
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
KnowledgeWiki: An OpenSource Tool for Creating Community-Curated Vocabulary, ...Nishita Jaykumar
Resource Description Framework (RDF) datasets can be
created by transforming structured databases, extracting
the triples from semi-structured and unstructured sources,
crowd-sourcing, or by integrating the existing datasets. The
reliability and quality of these datasets can be improved by
the participation of domain experts via a special purpose
tool or a crowd-sourced application. Wikidata and Semantic
MediaWiki are platforms which facilitate this kind of
crowd-sourced data curation.
We present our system, KnowledgeWiki, which is built
upon the existing Semantic MediaWiki. We develop a novel
extension by adopting the singleton property data model in
our KnowledgeWiki. This extension allows various kinds of
metadata about the RDF triples to be created in the Wiki.
We combine this extension with other extensions such as se-
mantic forms to provide a user-friendly, Wiki-like interface
for domain experts with no prior technical expertise to easily
curate data. We also present our new enhancement to Semantic
Mediawiki, which facilitates importing existing RDF
datasets into the wiki-based curating platform based on the
singleton property approach, that preserves the provenance
of individual triples. We also describe how it is being used
by the materials science community to create and curate
consolidated vocabularies.
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...Victor de Boer
Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperability, the richness of the original data is lost in the process. In this paper, we present a transparent and interactive methodology for ingesting, converting and linking cultural heritage metadata into Linked Data. The methodology is designed to maintain the richness and detail of the original metadata.
We introduce the XMLRDF conversion tool and describe how it is integrated in the ClioPatria semantic web toolkit. The methodology and the tools have been validated by converting the Amsterdam Museum metadata to a Linked Data version. In this way, the Amsterdam Museum became the first `small' cultural heritage institution with a node in the Linked Data cloud.
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Thomas Gottron
The intensive growth of the Linked Open Data (LOD) Cloud has spawned a web of data where a multitude of data sources provides huge amounts of valuable information across different domains. Nowadays, when accessing and using Linked Data more and more often the challenging question is not so much whether there is relevant data available, but rather where it can be found, how it is structured and to make best use of it.
I this lecture I will start with giving a brief introduction to the concepts underlying LOD. Then I will focus on three aspects of current research:
(1) Managing Linked Data. Index structures play an important role for making use of the information in LOD cloud. I will give an overview of indexing approaches, present algorithms and discuss the ideas behind the index structures.
(2) Analysing Linked Data. I will present methods for analysing various aspects of LOD. From an information theoretic analysis for measuring structural redundancy, over formal concept analysis for identifying alternative declarative descriptions to a dynamics analysis for capturing the evolution of Linked Data sources.
(3) Making Use of Linked Data. Finally I will give a brief overview and outlook on where the presented techniques and approaches are of practical relevance in applications.
(Talk at the IRSS summerschool 2014 in Athens)
Similar to Integration of collection data - A case study from the Oxford Museums and Libraries (OXLOD) (20)
Presentation to the RDA and Rare Materials workshop in Edinburgh on the 6th of November 2015. Jointly presented by Prof. Nicholas Pickwoad and Dr Athanasios Velios.
Extra value from documentation: a proposal for IconAthanasios Velios
A short presentation to the Icon Book and Paper Group during their 2015 AGM, introducing the Icon documentation network and a few ideas about open sharing of conservation data.
Presentation at the JISC data spring Sandpit in Birmingham. This was a proposal for a new project to examine the potential of the semantic desktop to capture contextual research data with case studies from the arts and humanities.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
10. OXLOD targets
●
Test workflows and technologies
●
Assess quality of records
– Estimate amount of future work
●
Bring GLAM community together
– Knowledge Transfer
36. Implied relationships
●
lining part of spine
●
spine adhesive has type animal
●
endband has technique Greek
●
secondary thread has material silk
37. CRM relationships/properties
●
lining part of spine
●
spine adhesive has type animal
●
endband has technique Greek
●
secondary thread has material silk
●
lining crm:P46_forms_part_of spine
●
spine adhesive crm:P2_has_type animal
●
making of endband crm:P32_used_general_technique Greek
●
secondary thread crm:P45_consists_of silk
40. Linked Data
●
World Wide Web Consortium
– Tim Berners-Lee five star open data
– http://5stardata.info/en/
●
Resource Description Framework
(RDF)
41. Implied relationships
●
lining part of spine
●
spine adhesive has type animal
●
endband has technique Greek
●
secondary thread has material silk
42. Predicate
(which property of subject?)
Resource Description
Framework
Object
(what value of property?)
Subject
(what do we describe?)
43. Predicate
(which property of subject?)
Resource Description
Framework
Object
(what value of property?)
Subject
(what do we describe?)
ParchmentMS. Ashmole 40
Has material
45. Resource Description Framework
zzzzzzzz, located at:
http://vocab.getty.edu/
aat/300011851
xxxxxxx, located at:
https://medieval.bodleian.ox.ac.uk/
catalog/manuscript_344
yyyyyy
46. Resource Description Framework
zzzzzzzz, located at:
http://vocab.getty.edu/
aat/300011851
xxxxxxx, located at:
https://medieval.bodleian.ox.ac.uk/
catalog/manuscript_344
yyyyyy, located at
http://www.cidoc-crm.org/
P45_consists_of
47. Linked Data
●
World Wide Web Consortium
– Tim Berners-Lee five star open data
– http://5stardata.info/en/
●
Resource Description Framework
(RDF)
●
Universal Resource Identifiers
(URI)
48. Resource Description Framework
zzzzzzzz, located at:
http://vocab.getty.edu/
aat/300011851
xxxxxxx, located at:
https://medieval.bodleian.ox.ac.uk/
catalog/manuscript_344
yyyyyy, located at
http://www.cidoc-crm.org/
P45_consists_of
55. provenance
event x
Giovan Battista
Recanti
crm: P11 had participant
crm:E21 Person
rdf: type
birth of
Giovan Recant
crm: P89 was born
...
crm: P4 has time-span
Giovan Battista
Recanti
Venice
wdt: P19
56. provenance
event x
Giovan Battista
Recanti
crm: P11 had participant
crm:E21 Person
rdf: type
birth of
Giovan Recant
crm: P89 was born
...
crm: P4 has time-span
Giovan Battista
Recanti
Venice
wdt: P19
reconciliation
owl:sameAs
57. provenance
event x
Giovan Battista
Recanti
crm: P11 had participant
crm:E21 Person
rdf: type
birth of
Giovan Recant
crm: P89 was born
...
crm: P4 has time-span
Giovan Battista
Recanti
Venice
wdt: P19
reconciliation
owl:sameAs
wdt: ...
wdt: ...
66. OXLOD:
enriched data
e.g. links to China
Bibliographical
Database
(Harvard)
OXLOD:
enriched data
e.g. links to China
Bibliographical
Database
(Harvard)
OXLOD:
enriched data
e.g. links to China
Biographical
Database
(Harvard)
76. Findings
●
Policy for URI design
– Versioning
– Stability during iterative mappings
●
Prefer local storage
●
Reconciliation never ends
– Start with local authority lists
– Maintain a institutional authority lists
– Use Wikidata(?) to save effort for extenal
reconciliation
– Has to be done by domain experts
77. Findings
●
Linked Data backend
– Solid research and querying tools
– Mature triple stores
– Licensing enforcing prior to authentication
– Permission management at workflow
●
Linked Data frontend
– Exciting tools
– Easy dataset delivery
– Serendipitous links through searching/browsing
78.
79. OXLOD in numbers
13,280,376 records (triples)
40,869 words of documentation
text
325 minutes of workshop videos
63 workshop participants
26 datasets
10 workshops