I used these slides in the context of a cultural heritage presentation so the examples are relevant to that community. For example the choice of CIDOC CRM is obvious in that community.
Introduction to linked open data, RDF: the Resource Description Framework, Tools to convert data to RDF, Tools for linking/reconciliation/resolution, Storing and maintaining the data, BBC and Linked Data
Linked Data and Images: Building Blocks for Cultural HeritageRobert Sanderson
Presentation given at UC Berkeley on 18th of April, 2014. Describes the benefits of Linked Data for Cultural Heritage, along with the details of IIIF and Open Annotation frameworks.
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources.
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
Topic: Doctoral Dissertation Defense
Title: MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Student: Sawood Alam
University: Old Dominion University
Date: Friday, December 4, 2020
From Feb 19 2014 NISO Virtual Conference: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Kevin Ford, Semantic Web Applications in Libraries: The Road to BIBFRAME
Slides used for a presentation at the CNI 2013 Fall meeting. Discusses the problem domain of the Hiberlink project, a collaboration between the Los Alamos National Laboratory and the University of Edinburgh, funded by the Andrew W. Mellon Foundation. Hiberlink investigates reference rot in web-based scholarly communication.
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the arquivo.pt comprising 5B mementos (archived web pages/files) to understand the nature and shape of its holdings. We generated MementoMaps with varying amount of detail from its HTML pages that have an HTTP status code of 200 OK. Additionally, we designed a single-pass, memory-efficient, and parallelization-friendly algorithm to compact a large MementoMap into a small one and an in-file binary search method for efficient lookup. We analyzed more than three years of MemGator (a Memento aggregator) logs to understand the response behavior of 14 public web archives. We evaluated MementoMaps by measuring their Accuracy using 3.3M unique URIs from MemGator logs. We found that a MementoMap of less than 1.5% Relative Cost (as compared to the comprehensive listing of all the unique original URIs) can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive while maintaining 100% Recall (i.e., zero false negatives).
I used these slides in the context of a cultural heritage presentation so the examples are relevant to that community. For example the choice of CIDOC CRM is obvious in that community.
Introduction to linked open data, RDF: the Resource Description Framework, Tools to convert data to RDF, Tools for linking/reconciliation/resolution, Storing and maintaining the data, BBC and Linked Data
Linked Data and Images: Building Blocks for Cultural HeritageRobert Sanderson
Presentation given at UC Berkeley on 18th of April, 2014. Describes the benefits of Linked Data for Cultural Heritage, along with the details of IIIF and Open Annotation frameworks.
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources.
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
Topic: Doctoral Dissertation Defense
Title: MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Student: Sawood Alam
University: Old Dominion University
Date: Friday, December 4, 2020
From Feb 19 2014 NISO Virtual Conference: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Kevin Ford, Semantic Web Applications in Libraries: The Road to BIBFRAME
Slides used for a presentation at the CNI 2013 Fall meeting. Discusses the problem domain of the Hiberlink project, a collaboration between the Los Alamos National Laboratory and the University of Edinburgh, funded by the Andrew W. Mellon Foundation. Hiberlink investigates reference rot in web-based scholarly communication.
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the arquivo.pt comprising 5B mementos (archived web pages/files) to understand the nature and shape of its holdings. We generated MementoMaps with varying amount of detail from its HTML pages that have an HTTP status code of 200 OK. Additionally, we designed a single-pass, memory-efficient, and parallelization-friendly algorithm to compact a large MementoMap into a small one and an in-file binary search method for efficient lookup. We analyzed more than three years of MemGator (a Memento aggregator) logs to understand the response behavior of 14 public web archives. We evaluated MementoMaps by measuring their Accuracy using 3.3M unique URIs from MemGator logs. We found that a MementoMap of less than 1.5% Relative Cost (as compared to the comprehensive listing of all the unique original URIs) can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive while maintaining 100% Recall (i.e., zero false negatives).
Web open standards for linked data and knowledge graphs as enablers of EU dig...Fabien Gandon
Web open standards for linked data and knowledge graphs as enablers of EU digital sovereignty
ENDORSE Keynote by Fabien GANDON, 19/03/2021
https://op.europa.eu/en/web/endorse
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
Slides of the tutorial Stéphane Corlosquet, Lin Clark and Alexandre Passant presented at SemTech 2010 in San Francisco http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42& proposalid=2889
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
To the Rescue of the Orphans of Scholarly Communication
presentation at CNI Spring 2017 meeting
Herbert Van de Sompel
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
http://orcid.org/0000-0003-3749-8116
Martin Klein
http://orcid.org/0000-0003-0130-2097
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
Potential utilization of the emerging Web technology, Web Bundles, in Web archiving, presented at the IIPC WAC 2021 in Session 8 by Sawood Alam.
Recording: https://youtu.be/lQX9v9V0FRQ
A talk about the gap between theory and practice with W3C Semantic Web and Dublin Core standards, and how the DC Tools Community can help collectively reduce the cost of that gap.
Given as part of the DC Tools Community workshop at LIDA2009 in Zadar, Croatia.
Cultural Mapping & Digital Storytelling in a Social ContextStefan Kolgen
This presentation took place on October 23, 2014 during the conference 'Cultural Mapping: Debating Spaces & Places' in Valletta (Malta). The academic paper can be downloaded at http://bit.ly/1Go2AZ8
Web open standards for linked data and knowledge graphs as enablers of EU dig...Fabien Gandon
Web open standards for linked data and knowledge graphs as enablers of EU digital sovereignty
ENDORSE Keynote by Fabien GANDON, 19/03/2021
https://op.europa.eu/en/web/endorse
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
Slides of the tutorial Stéphane Corlosquet, Lin Clark and Alexandre Passant presented at SemTech 2010 in San Francisco http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42& proposalid=2889
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
To the Rescue of the Orphans of Scholarly Communication
presentation at CNI Spring 2017 meeting
Herbert Van de Sompel
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
http://orcid.org/0000-0003-3749-8116
Martin Klein
http://orcid.org/0000-0003-0130-2097
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
Potential utilization of the emerging Web technology, Web Bundles, in Web archiving, presented at the IIPC WAC 2021 in Session 8 by Sawood Alam.
Recording: https://youtu.be/lQX9v9V0FRQ
A talk about the gap between theory and practice with W3C Semantic Web and Dublin Core standards, and how the DC Tools Community can help collectively reduce the cost of that gap.
Given as part of the DC Tools Community workshop at LIDA2009 in Zadar, Croatia.
Cultural Mapping & Digital Storytelling in a Social ContextStefan Kolgen
This presentation took place on October 23, 2014 during the conference 'Cultural Mapping: Debating Spaces & Places' in Valletta (Malta). The academic paper can be downloaded at http://bit.ly/1Go2AZ8
Presentation delivered by Rebecca Cann, Cultural Planning Supervisor, City of St.Catharines at November 27 2008 "Economies in Transition" forum in Chatham, Ontario.
GIS as tool for cultural heritage managementyllferizi
Digital tools for Disaster Management: Lecture & Workshop
- The usage of GIS, crowd mapping, social media and similar, in processing data
- Data management and protection
MUGNA- is the outcome of a regional and national planning process involving the various NCCA sub-commissions with the end view of expanding the contributions of culture to national growth and development . Towards a Sustained Cultural Development of Negros Island
#National Commission for Culture and the Arts (NCCA), Negros Cultural Foundation for the Negros Island, YATTA. #For 2016, six cities/ towns of Negros Island have been selected for its pilot run, namely DUMAGUETE, BAIS & AMLAN,CALATRAVA, KABANKALAN AND MURCIA.
Prepared for Geographic Representation Now
Harvard Graduate School of Design
11.11.11
Abstract: Critical geographic information systems is an area of research positioned at the intersection of critical geography and geographic information science, drawing together technical capabilities for geographic representation and analysis with the critical capacities of social theory, more-than-human geographies, and the digital humanities. Critical GIS scholarship is particularly influenced by the work of participatory action researchers, the histories of cartography and geographic information technologies, and the inclusion of alternative (radical, local, everyday) knowledges. It inherits a focused attention to the social implications of geospatial technologies from the GIS and Society tradition while being cognizant of the technical debates and intricacies of GIScience. In this presentation, I sketch the present history of critical GIS. That is, I reflect upon specific engagements in Geography that currently situate critical GIS, and outline the more pressing aspects of its research agenda. I then introduce critical mapping efforts at the University of Kentucky, including work around public engagement, the mapping of user-generated content, and open data advocacy in local government.
This presentation provides an accessible introduction to Linked Open Data (LOD) and how LOD is modelled and made available online. The presenters will discuss several LOD projects created by libraries and archives in order to illustrate the benefits of applying LOD principles and practices. They will also demonstrate easy ways to leverage the power of LOD for archival organizations and their digital collections, with concrete examples involving WikiData, Omeka S, and the SNAC (Social Networks and Archival Context) Project.
Society of Georgia Archivists 2018 Annual Meeting
Speakers:
Josh Hogan, Atlanta University Center Robert W. Woodruff Library
Cliff Landis, Atlanta University Center Robert W. Woodruff Library
American Art Collaborative Planning Grant Educational Briefings
Linked Data and Tools
Pedro Szekely - USC/Information Sciences Institute
September 30, 2014
From LookBackMaps to Linked Open Data in Libraries, Archives & Museums
Ignite talk for “Visualizing Environmental Change in the Bay Area: Past, Present, and Future”
Bill Lane Center for the American West
Stanford University
May 20, 2011
A lot of talk about the future of the internet sounds almost hippie-spiritual or faux-philosophical. The Internet is not the same as the world-wide-web. But the Internet-of-Things and the Semantic Web - all parts of Web 3.0, are beginning to be very important to our learning environments. Here is a summary of key features, ranging from access, creativity, and information architecture.
At Utah State University, a pilot project is under development to evaluate the benefits of tracking data sets and faculty publications using the online catalog and the Library’s institutional repository.
With federal mandates to make publications and data open, universities look for solutions to track compliance. At Utah State University, the Sponsored Programs Office follows up with researchers to determine where data has been or will be deposited, per the terms of their grant.
Interested in making this publicly discoverable, the Library, Sponsored Programs, and Research Office are working together to pilot a project that enables the creation of publicly accessible MARC and Dublin Core records for data deposited by USU faculty. This project aims to make data sets, as well as publications, visible in research portals such as WorldCat, as well through Google searches.
This presentation will describe the project and anticipated benefits, as well as outline the roles of the cataloging staff and data librarian, and the involvement of the Research Office.
QR Codes and Augmented Reality Help LibrariesExtend Services Rachel Vacek
Emerging technologies like QR Codes and Augmented Reality can help libraries extend services, widen access to resources, and promote events to users in exciting and innovative ways. Using simple and free technologies, QR codes can be created easily and embedded almost anywhere. These oddly shaped barcode-like icons are processed by camera phones to direct the user to online websites, videos, or they can simply provide more information.
Augmented reality takes existing visual or video information and adds additional layers of computer-generated graphics, pattern recognition, and other visual effects. This session will highlight how the University of Houston Libraries and other types of libraries are using these technologies to promote, market, outreach, teach, and engage with users in new and exciting ways.
An Introduction to Linked Data for Librarians (2018-06-28)Cliff Landis
Presented to the Special Libraries Association Georgia Chapter for their Spring Luncheon. This presentation gives advice for librarians on how to get started exploring and implementing linked data.
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23
Cynthia Hudson-Vitale, Digital Data Outreach Librarian, Washington University
Brianna Marshall, Digital Curation Coordinator, University of Wisconsin-Madison
Amy Nurnberger, Research Data Manager, Columbia University
These slides accompanied the first part of the workshop that Vinayak Das Gupta and myself gave at the Data Visualization for the Arts and Humanities event, which was held in Queen's University, Belfast on 5-6 March 2015. The workshop, entitled 'Data-mining the Semantic Web and spatially visualising the results', introduced the participants to the concepts and technologies of Open Data, the Semantic Web, RDF, SPARQL, GeoJSON and Leaflet.js. These slides cover the data-mining of online cultural heritage resources.
Connecting the Smithsonian American Art Museum to the Linked Data CloudPedro Szekely
Slides for our "Connecting the Smithsonian American Art Museum to the Linked Data Cloud." paper presented at the 10th Extended Semantic Web Conference (ESWC), in Montpellier, May 2013. http://eswc-conferences.org/sites/default/files/papers2013/szekely.pdf
Karma: Tools for Publishing Cultural Heritage Data in the Linked Open Data CloudPedro Szekely
Tools to convert data from databases to Linked Data. This presentation shows how Karma (iso.edu/integration/karma) is used to publish data from cultural heritage databases to the Linked Data cloud. Karma supports conversion of data to RDF according to user-selected ontologies and linking to other datasets such as dbpedia.org.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Linked Data, Cultural Heritage & the Karma Mapping Software
1. Linked Data &
Cultural Heritage
Pedro Szekely and Craig Knoblock
USC/Information Sciences Institute
pszekely@isi.edu, knoblock@isi.edu
http://isi.edu/integration/karma
February 2015
2. Outline
• Problem
• Linked Data
• Karma
• Reconciliation
• Next steps
CC-By 2.0 2USC Information Sciences Institute
4. Humans Browsing the Web
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
Indianapolis
Museum
of Art
The Metropolitan
Museum of Art
National Portrait
Gallery
Smithsonian American
Art Museum
USC Information Sciences Institute CC-By 2.0 4
7. WEB PAGES ARE UNUSABLE FOR
CREATING INNOVATIVE APPLICATIONS
USING THE DATA
CC-By 2.0 7USC Information Sciences Institute
8. SOLUTION:
Linked Open Data
“web pages for computers”
using W3C standards for publishing data
CC-By 2.0 8USC Information Sciences Institute
9. CC-By 2.0 9
Tim Berners Lee
on Linked Open Data
USC Information Sciences Institute
http://youtu.be/OM6XIICm_qo
10. Humans Browsing the Web
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
Indianapolis
Museum
of Art
The Metropolitan
Museum of Art
National Portrait
Gallery
Smithsonian American
Art Museum
USC Information Sciences Institute CC-By 2.0 10
12. Publish Your Raw Data
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
Indianapolis
Museum
of Art
The Metropolitan
Museum of Art
National Portrait
Gallery
Smithsonian American
Art Museum
USC Information Sciences Institute CC-By 2.0 12
13. CC-By 2.0 13
Examples of
Raw Data Now
USC Information Sciences Institute
https://github.com/cooperhewitt/collection
https://github.com/IMAmuseum/ima-collection
14. Convert Data to CRM (2 star)
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
Indianapolis
Museum
of Art
The Metropolitan
Museum of Art
National Portrait
Gallery
Smithsonian American
Art Museum
USC Information Sciences Institute CC-By 2.0 14
15. Linked Museum Data (3 star)
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
Indianapolis
Museum
of Art
The Metropolitan
Museum of Art
National Portrait
Gallery
Smithsonian American
Art Museum
USC Information Sciences Institute CC-By 2.0 15
17. Represent Resources Using URIs
h&p://szekelys.com/family#pedro
“Pedro”
h&p://xmlns.com/foaf/0.1/firstName
USC Information Sciences Institute CC-By 2.0 17
18. Represent Information as Triples
h&p://szekelys.com/family#pedro
h&p://xmlns.com/foaf/0.1/firstName
Subject
Predicate
Object
The resource being described
A property of the resource
The value of the property
“Pedro”
USC Information Sciences Institute CC-By 2.0 18
22. Steps to Create Linked Open Data
• Publish the raw data
… get the data out of the proprietary database
• Select ontologies
… that define classes and properties for our data
• Define URI scheme
… identifiers of your resources
• Convert data to RDF
… from data sources to the ontologies
• Identify links to other Linked Data datasets
… aka reconciliation, entity resolution, …
USC Information Sciences Institute CC-By 2.0 22
23. CC-By 2.0 23
CIDOC CRM
• Select ontologies
… that define classes and properties for our data
http://www.cidoc-crm.org/
USC Information Sciences Institute
24. CC-By 2.0 24
• Define URI scheme
… identifiers of your resources
USC Information Sciences Institute
26. CC-By 2.0 26
• Convert data to RDF
… from data sources to the ontologies
USC Information Sciences Institute
27. RDF Mapping Tools
CC-By 2.0 27USC Information Sciences Institute
TOOL SHORTCOMINGS BENEFITS
custom
code
labor intensive w
error prone
flexible
R2RML difficult to learn w
only SQL databases
W3C standard w good documentation
w multiple vendors
Open
Refine
no guidance w
only tabular data
graphical user interface w support
for reconciliation w open source
Karma university product easy to use w flexible w multiple data
formats w multiple deployment
databases w scalable w R2RML
compatible w open source
29. KARMA DEMO
CC-By 2.0 29USC Information Sciences Institute
http://youtu.be/h3_yiBhAJIc
30. Easy To Use
CC-By 2.0 30
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
CLEAR DEPICTION OF MAPPING
USC Information Sciences Institute
31. CC-By 2.0 31
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
LEARNS TO MAP
YOUR DATA
USC Information Sciences Institute
32. CC-By 2.0 32
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
SUGGEST CORRECT
ADJUSTMENTS
USC Information Sciences Institute
33. CC-By 2.0 33
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
EMBEDDED PYTHON
SCRIPTING
USC Information Sciences Institute
34. CC-By 2.0 34
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
IMPORT POPULAR
DATA FORMATS
USC Information Sciences Institute
35. CC-By 2.0 35
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
OUTPUT RDF IN
MULTIPLE FORMATS
ntriples
JSON
AVRO
SPARQL
ElasticSearch, GitHub, …
Hadoop, BigData
USC Information Sciences Institute
36. CC-By 2.0 36
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
40 million documents
1 billion triples
larger than all AAC museums combined
USC Information Sciences Institute
37. CC-By 2.0 37
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
periodic update
every hour, every day
continuous update
as new records come in
USC Information Sciences Institute
38. CC-By 2.0 38
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
Karma compatible with
R2RML tools
USC Information Sciences Institute
39. CC-By 2.0 39
easy to use w flexible w multiple data formats w multiple deployment databases w scalable w R2RML compatible w open source
Karma Is Open Souce
USC Information Sciences Institute
44. URI Reconciliation In Karma
Pedro
Szekely
USC Information Sciences Institute CC-By 2.0 44
45. Results of Automatic Linking
Pedro
Szekely
99% are correct
6% are missing
USC Information Sciences Institute CC-By 2.0 45
46. Steps to Create Linked Open Data
• Publish the raw data
… get the data out of the proprietary database
• Select ontologies
… that define classes and properties for our data
• Define URI scheme
… identifiers of your resources
• Convert data to RDF
… from data sources to the ontologies
• Identify links to other Linked Data datasets
… aka reconciliation, entity resolution, …
USC Information Sciences Institute CC-By 2.0 46
47. CC-By 2.0 47
TMS to CRM
easy?
USC Information Sciences Institute
48. CC-By 2.0 48
TMS to CRM
easy?
USC Information Sciences Institute
NO
49. COMMUNITY EFFORT
• Publish the raw data
… get the data out of the proprietary database
• Select ontologies
… that define classes and properties for our data
• Define URI scheme
… identifiers of your resources
• Convert data to RDF
… from data sources to the ontologies
• Identify links to other Linked Data datasets
… aka reconciliation, entity resolution, …
USC Information Sciences Institute CC-By 2.0 49
50. Radical Ideas
• ULAN in Wikipedia or Wikidata
• ULAN in GitHub
• Collection data in GitHub
• Community created CRM mappings in GitHub
• CRM in JSON-LD in GitHub
• Tools to export from TMS to GitHub
USC Information Sciences Institute CC-By 2.0 50
52. Deployment Options
CC-By 2.0 52USC Information Sciences Institute
Technology Shortcomings Benefits
SPARQL
endpoint
low reliability,
esoteric, slow
sophisticated query
language
RDF dump no query capability,
esoteric
flexibility: clients can
download and use in
applications, easy to publish
JSON-LD +
ElasticSearch
restricted query
language
very high performance,
mainstream technology, easy
to publish
Karma supports the three options
53. CC-By 2.0 53
federation
every publishes their data with
their own URIs
aggregation
aggregator repulishes everyone’s
data with new URIs
USC Information Sciences Institute
54. thanks for your attention!
https://github.com/usc-isi-i2/Web-Karma!
Open Source, Apache 2 License!
CC-By 2.0 54USC Information Sciences Institute