It is hard to compute fixity on archived web pagesmaturban
ย
This document discusses the challenges of computing fixity on archived web pages. It shows that archived pages can change over time due to redirects, unavailable mementos, dynamic content, transformations by archives, and changes to timemaps. Computing cryptographic hashes on archived pages to check for changes is difficult because the pages may not be consistently replayable at different times.
A Framework for Aggregating Private and Public Web Archivesjcdl2018
ย
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 โข @WebSciDL
#jcdl2018
This document discusses linked open data and the Ontos LD Information Workbench. It provides examples of organizations that publish open government and cultural data as linked open data. It then describes the key components of the Ontos tool for authoring, storing, linking, exploring and managing linked data over its life cycle. These include tools for extraction, storage, linking datasets, semantic search and browsing linked data.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
ย
Atanas Kiryakov's, Ontotextโs CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) โ a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotextโs own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
1) The document compares different methods for representing statement-level metadata in RDF, including RDF reification, singleton properties, and RDF*.
2) It benchmarks the storage size and query execution time of representing biomedical data using each method in the Stardog triplestore.
3) The results show that RDF* requires fewer triples but the database size is larger, and it outperforms the other methods for complex queries.
This document discusses creating a knowledge graph for Irish history as part of the Beyond 2022 project. It will include digitized records from core partners documenting seven centuries of Irish history. Entities like people, places, and organizations will be extracted from source documents and related in a knowledge graph using semantic web technologies. An ontology was created to provide historical context and meaning to the relationships between entities in Irish history. Tools will be developed to explore and search the knowledge graph to advance historical research.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
ย
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Data 2 Documents: Modular and Distributive Content Management in RDFNiels Ockeloen
ย
This document describes a system called Data 2 Documents (D2D) that aims to enable modular and distributive content management on the web using Linked Data and RDF. It discusses how D2D addresses issues with sharing content across different content management systems and websites by modeling the knowledge involved in content selection, composition and rendering. An evaluation involved experts and students performing tasks in D2D, and found that participants could complete the tasks and would consider using D2D for future website development. Future work is needed to develop graphical user interfaces and JavaScript implementations for D2D.
It is hard to compute fixity on archived web pagesmaturban
ย
This document discusses the challenges of computing fixity on archived web pages. It shows that archived pages can change over time due to redirects, unavailable mementos, dynamic content, transformations by archives, and changes to timemaps. Computing cryptographic hashes on archived pages to check for changes is difficult because the pages may not be consistently replayable at different times.
A Framework for Aggregating Private and Public Web Archivesjcdl2018
ย
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 โข @WebSciDL
#jcdl2018
This document discusses linked open data and the Ontos LD Information Workbench. It provides examples of organizations that publish open government and cultural data as linked open data. It then describes the key components of the Ontos tool for authoring, storing, linking, exploring and managing linked data over its life cycle. These include tools for extraction, storage, linking datasets, semantic search and browsing linked data.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
ย
Atanas Kiryakov's, Ontotextโs CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) โ a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotextโs own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
1) The document compares different methods for representing statement-level metadata in RDF, including RDF reification, singleton properties, and RDF*.
2) It benchmarks the storage size and query execution time of representing biomedical data using each method in the Stardog triplestore.
3) The results show that RDF* requires fewer triples but the database size is larger, and it outperforms the other methods for complex queries.
This document discusses creating a knowledge graph for Irish history as part of the Beyond 2022 project. It will include digitized records from core partners documenting seven centuries of Irish history. Entities like people, places, and organizations will be extracted from source documents and related in a knowledge graph using semantic web technologies. An ontology was created to provide historical context and meaning to the relationships between entities in Irish history. Tools will be developed to explore and search the knowledge graph to advance historical research.
ROI in Linking Content to CRM by Applying the Linked Data StackMartin Voigt
ย
Today, decision makers in enterprises have to rely more and more on a variety of data sets that are internally but also externally available in heterogeneous formats. Therefore, intelligent processes are required to build an integrated knowledge-base. Unfortunately, the adoption of the Linked Data lifecycle within enterprises, which targets the extraction, interlinking, publishing and analytics of distributed data, lags behind the public domain due to missing frameworks that are efficiently to deploy and ease to use. In this paper, we present our adoption of the lifecycle through our generic, enterprise-ready Linked Data workbench. To judge its benefits, we describe its application within a real-world Customer Relationship Management scenario. It shows (1) that sales employee could significantly reduce their workload and (2) that the integration of sophisticated Linked Data tools come with an obvious positive Return on Investment.
Data 2 Documents: Modular and Distributive Content Management in RDFNiels Ockeloen
ย
This document describes a system called Data 2 Documents (D2D) that aims to enable modular and distributive content management on the web using Linked Data and RDF. It discusses how D2D addresses issues with sharing content across different content management systems and websites by modeling the knowledge involved in content selection, composition and rendering. An evaluation involved experts and students performing tasks in D2D, and found that participants could complete the tasks and would consider using D2D for future website development. Future work is needed to develop graphical user interfaces and JavaScript implementations for D2D.
The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.
(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)
Publishing the British National Bibliography as Linked Open Data / Corine Del...CIGScotland
ย
Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
ย
Macmillan is developing a linked data platform and semantic data model to power discovery services for scientific content. They have created an RDF-based data model and ontology to organize over 270 million triples of metadata. They are focusing on internal use cases and have implemented a hybrid architecture using MarkLogic and a triplestore to optimize query performance and deliver content in under 200ms. Going forward, they aim to expand the ontology, enable more advanced querying, and establish the semantic data model as a core enterprise asset.
Richard Wallis, an OCLC Technology Evangelist, discusses how libraries can make their data more visible and connected on the web by publishing it as linked open data using common web vocabularies like Schema.org. Currently, library linked data exists in silos using different local vocabularies, making the data hard to discover and integrate. Adopting Schema.org could help library data reach the billions of web pages and domains that already use this general purpose vocabulary to describe things on the web.
Maximising (Re)Usability of Library metadata using Linked Data Asuncion Gomez-Perez
ย
This document discusses maximizing the reusability of library metadata using linked data. It motivates the use of linked data by describing the current heterogeneous data landscape with issues around language, format, and lack of interoperability. It then discusses how linked data allows for uniform access through agreed upon vocabularies and standards. Specific issues around language, provenance, license and the linked data process are covered. Uses of linked library metadata are also discussed.
This document summarizes Richard Wallis and his work. Richard Wallis is an independent consultant and founder of Data Liberate. He currently works with OCLC and Google to develop schema standards. He chairs several W3C community groups focused on developing schemas for bibliographic data and archives data using Schema.org.
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
ย
The document discusses the DBpedia project, which extracts structured data from Wikipedia to build a multilingual knowledge graph. It describes DBpedia's goals of making this data openly available and supporting its community. The DBpedia Association is being formed as a non-profit to oversee the infrastructure and support contributors. Funding will come from donations and sponsorships. Upcoming events include the DBpedia Community Meeting coinciding with the SEMANTiCS conference in September.
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernรกndez
ย
Sergio Fernรกndez gave a presentation on geospatial querying in Apache Marmotta. He explained that Marmotta is an open platform for linked data that allows publishing and building applications on linked data. It includes features like a read-write linked data server and SPARQL querying. He discussed how GeoSPARQL allows representing and querying geospatial data on the semantic web by defining a vocabulary and SPARQL extension. Marmotta implements GeoSPARQL by materializing geospatial data and supports topological relations and functions through PostGIS. He demonstrated example GeoSPARQL queries on municipalities in Madrid, rivers bordering Austria, and mountain bike routes crossing cities.
[Databeers] 06/05/2014 - Boris Villazon: โData Integration - A Linked Data ap...Data Beers
ย
This document discusses using linked data approaches for data integration. It introduces linked data as a way to publish and connect disparate data sources using common identifiers and semantic web standards like URIs and RDF. This allows data to be queried and exploited as a single global database. Examples are given of applying linked data for integrating enterprise data sources and for publishing geospatial data from Ecuador using semantic representations. The benefits of linked data for data integration are that it enables querying across data silos and consuming data without complex transformations by using the graph-based RDF data model.
Contextual Computing - Knowledge Graphs & Web of EntitiesRichard Wallis
ย
Richard Wallis gave a presentation on contextual computing and knowledge graphs at the SmartData 2017 conference. He discussed how knowledge graphs powered by structured data on the web are providing global context that enables new applications of cognitive and contextual computing. Schema.org plays a key role by defining a common vocabulary and enabling a web of related entities laid out as a global graph. This graph of entities delivers context on a global scale and lays the foundation for the next revolution in computing.
The RDF Report Card: Beyond the Triple CountLeigh Dodds
ย
My talk from the Semtech Biz conference in London.
I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.
The RDF Report Card is offered as one simple, high-level visualization.
Smart Data Applications powered by the Wikidata Knowledge GraphPeter Haase
ย
This document discusses Wikidata and how it can power smart data applications. Wikidata is a large, structured, collaborative knowledge graph containing over 15 million entities. It collects data in a structured form from Wikipedia pages and can be queried like a database using the Wikidata Query Service. The document promotes metaphacts, an enterprise knowledge graph platform that can be used to build applications using Wikidata, enrich Wikidata with private data, and enable companies to build and leverage their own knowledge graphs for various domains such as cultural heritage and pharma.
A Platform for Object-Action Semantic Web InteractionRoberto Garcรญa
ย
Semantic Web applications tests show that their usability is seriously compromised. This motivates the exploration of alternative interaction paradigms, different from the \"traditional\" Web or desktop applications ones. The Rhizomer platform is based on the object-action interaction paradigm, which is better suited for heterogeneous resource spaces such as those common in the Semantic Web. Resources, described by means of RDF metadata, correspond to the objects from the interaction point of view and Rhizomer provides browsing mechanisms for them. Semantic web services, dynamically associated to these objects, correspond to the actions. Rhizomer has been applied in the context of a media house to build an audiovisual content management system. End-users of this system, journalists and archivists, are able to navigate the content repository through semantic metadata describing content pieces and the domain knowledge these pieces are referring to. Those resources constitute the objects to which, when the user selects one of them, semantic web services dynamically associate specialized visualization and interaction views, the actions.
This document summarizes the origins and development of Schema.org. It began as an effort by Tim Berners-Lee in 1989 to conceive of the World Wide Web. Later developments included the semantic web in 2001 and linked open data in 2009. Schema.org was introduced in 2011 as a joint effort between Google, Bing, Yahoo, and Yandex to create a common set of schemas for structured data on web pages. It has since grown significantly, with over 12 million websites now using Schema.org markup and over 500 types and 800 properties defined. Various communities like libraries have also influenced Schema.org through extensions and standards like LRMI.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
ย
These are slides from a live webinar taken place January 2018.
GraphDBโข Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDBโข. In this webinar, we demonstrated how to install and set-up GraphDBโข 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDBโข, you can start smartly managing your data assets, visually represent your data model and get insights from them.
The document discusses linking XML data to the web of linked data. It provides examples of converting XML content like tables and files into linked data formats like Turtle and JSON-LD. It also demonstrates querying linked data from XML files using SPARQL and XSLT transformations and serving linked data from XML using Apache Jena Fuseki. The document aims to help integrate linked data processing into existing XML tooling and workflows.
A presentation by Gordon Dunsire.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
A Framework for Aggregating Public and Private Web ArchivesMat Kelly
ย
This document proposes a framework for aggregating private and public web archives. It discusses the current state of memento aggregation and outlines ways to make timemaps and aggregation more expressive. This includes adding attributes to timemaps to provide more information about mementos without requiring full dereferencing, such as status codes, content digests, and indicators of private versus public captures. The framework aims to provide a more comprehensive view of the archived web by incorporating both personal and non-aggregated archives.
Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor...Ian Milligan
ย
This was the second part of a joint presentation I did with Jimmy Lin (Maryland) at the "Web Archiving Collaboration: New Tools and Models" conference at Columbia University, New York NY on 4 June 2015.
The slideset used to conduct an introduction/tutorial
on DBpedia use cases, concepts and implementation
aspects held during the DBpedia community meeting
in Dublin on the 9th of February 2015.
(slide creators: M. Ackermann, M. Freudenberg
additional presenter: Ali Ismayilov)
Publishing the British National Bibliography as Linked Open Data / Corine Del...CIGScotland
ย
Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
ย
Macmillan is developing a linked data platform and semantic data model to power discovery services for scientific content. They have created an RDF-based data model and ontology to organize over 270 million triples of metadata. They are focusing on internal use cases and have implemented a hybrid architecture using MarkLogic and a triplestore to optimize query performance and deliver content in under 200ms. Going forward, they aim to expand the ontology, enable more advanced querying, and establish the semantic data model as a core enterprise asset.
Richard Wallis, an OCLC Technology Evangelist, discusses how libraries can make their data more visible and connected on the web by publishing it as linked open data using common web vocabularies like Schema.org. Currently, library linked data exists in silos using different local vocabularies, making the data hard to discover and integrate. Adopting Schema.org could help library data reach the billions of web pages and domains that already use this general purpose vocabulary to describe things on the web.
Maximising (Re)Usability of Library metadata using Linked Data Asuncion Gomez-Perez
ย
This document discusses maximizing the reusability of library metadata using linked data. It motivates the use of linked data by describing the current heterogeneous data landscape with issues around language, format, and lack of interoperability. It then discusses how linked data allows for uniform access through agreed upon vocabularies and standards. Specific issues around language, provenance, license and the linked data process are covered. Uses of linked library metadata are also discussed.
This document summarizes Richard Wallis and his work. Richard Wallis is an independent consultant and founder of Data Liberate. He currently works with OCLC and Google to develop schema standards. He chairs several W3C community groups focused on developing schemas for bibliographic data and archives data using Schema.org.
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
ย
The document discusses the DBpedia project, which extracts structured data from Wikipedia to build a multilingual knowledge graph. It describes DBpedia's goals of making this data openly available and supporting its community. The DBpedia Association is being formed as a non-profit to oversee the infrastructure and support contributors. Funding will come from donations and sponsorships. Upcoming events include the DBpedia Community Meeting coinciding with the SEMANTiCS conference in September.
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernรกndez
ย
Sergio Fernรกndez gave a presentation on geospatial querying in Apache Marmotta. He explained that Marmotta is an open platform for linked data that allows publishing and building applications on linked data. It includes features like a read-write linked data server and SPARQL querying. He discussed how GeoSPARQL allows representing and querying geospatial data on the semantic web by defining a vocabulary and SPARQL extension. Marmotta implements GeoSPARQL by materializing geospatial data and supports topological relations and functions through PostGIS. He demonstrated example GeoSPARQL queries on municipalities in Madrid, rivers bordering Austria, and mountain bike routes crossing cities.
[Databeers] 06/05/2014 - Boris Villazon: โData Integration - A Linked Data ap...Data Beers
ย
This document discusses using linked data approaches for data integration. It introduces linked data as a way to publish and connect disparate data sources using common identifiers and semantic web standards like URIs and RDF. This allows data to be queried and exploited as a single global database. Examples are given of applying linked data for integrating enterprise data sources and for publishing geospatial data from Ecuador using semantic representations. The benefits of linked data for data integration are that it enables querying across data silos and consuming data without complex transformations by using the graph-based RDF data model.
Contextual Computing - Knowledge Graphs & Web of EntitiesRichard Wallis
ย
Richard Wallis gave a presentation on contextual computing and knowledge graphs at the SmartData 2017 conference. He discussed how knowledge graphs powered by structured data on the web are providing global context that enables new applications of cognitive and contextual computing. Schema.org plays a key role by defining a common vocabulary and enabling a web of related entities laid out as a global graph. This graph of entities delivers context on a global scale and lays the foundation for the next revolution in computing.
The RDF Report Card: Beyond the Triple CountLeigh Dodds
ย
My talk from the Semtech Biz conference in London.
I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.
The RDF Report Card is offered as one simple, high-level visualization.
Smart Data Applications powered by the Wikidata Knowledge GraphPeter Haase
ย
This document discusses Wikidata and how it can power smart data applications. Wikidata is a large, structured, collaborative knowledge graph containing over 15 million entities. It collects data in a structured form from Wikipedia pages and can be queried like a database using the Wikidata Query Service. The document promotes metaphacts, an enterprise knowledge graph platform that can be used to build applications using Wikidata, enrich Wikidata with private data, and enable companies to build and leverage their own knowledge graphs for various domains such as cultural heritage and pharma.
A Platform for Object-Action Semantic Web InteractionRoberto Garcรญa
ย
Semantic Web applications tests show that their usability is seriously compromised. This motivates the exploration of alternative interaction paradigms, different from the \"traditional\" Web or desktop applications ones. The Rhizomer platform is based on the object-action interaction paradigm, which is better suited for heterogeneous resource spaces such as those common in the Semantic Web. Resources, described by means of RDF metadata, correspond to the objects from the interaction point of view and Rhizomer provides browsing mechanisms for them. Semantic web services, dynamically associated to these objects, correspond to the actions. Rhizomer has been applied in the context of a media house to build an audiovisual content management system. End-users of this system, journalists and archivists, are able to navigate the content repository through semantic metadata describing content pieces and the domain knowledge these pieces are referring to. Those resources constitute the objects to which, when the user selects one of them, semantic web services dynamically associate specialized visualization and interaction views, the actions.
This document summarizes the origins and development of Schema.org. It began as an effort by Tim Berners-Lee in 1989 to conceive of the World Wide Web. Later developments included the semantic web in 2001 and linked open data in 2009. Schema.org was introduced in 2011 as a joint effort between Google, Bing, Yahoo, and Yandex to create a common set of schemas for structured data on web pages. It has since grown significantly, with over 12 million websites now using Schema.org markup and over 500 types and 800 properties defined. Various communities like libraries have also influenced Schema.org through extensions and standards like LRMI.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
ย
These are slides from a live webinar taken place January 2018.
GraphDBโข Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDBโข. In this webinar, we demonstrated how to install and set-up GraphDBโข 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDBโข, you can start smartly managing your data assets, visually represent your data model and get insights from them.
The document discusses linking XML data to the web of linked data. It provides examples of converting XML content like tables and files into linked data formats like Turtle and JSON-LD. It also demonstrates querying linked data from XML files using SPARQL and XSLT transformations and serving linked data from XML using Apache Jena Fuseki. The document aims to help integrate linked data processing into existing XML tooling and workflows.
A presentation by Gordon Dunsire.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
A Framework for Aggregating Public and Private Web ArchivesMat Kelly
ย
This document proposes a framework for aggregating private and public web archives. It discusses the current state of memento aggregation and outlines ways to make timemaps and aggregation more expressive. This includes adding attributes to timemaps to provide more information about mementos without requiring full dereferencing, such as status codes, content digests, and indicators of private versus public captures. The framework aims to provide a more comprehensive view of the archived web by incorporating both personal and non-aggregated archives.
Warcbase: Building a Scalable Platform on HBase and Hadoop - Part Two, Histor...Ian Milligan
ย
This was the second part of a joint presentation I did with Jimmy Lin (Maryland) at the "Web Archiving Collaboration: New Tools and Models" conference at Columbia University, New York NY on 4 June 2015.
Warcbase Building a Scalable Platform on HBase and Hadoop - Part Two: Histori...Ian Milligan
ย
This is the second part of a joint presentation I did with Jimmy Lin (University of Maryland) at the "Web Archiving Collaboration: New Tools and Models" conference at Columbia University, New York NY on 4 June 2015.
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
ย
AWS hosts a variety of public data sets that anyone can access for free. Previously, large data sets such as satellite imagery or genomic data have required hours or days to locate, download, customize, and analyze. When data is made publicly available on AWS, anyone can analyze any volume of data without downloading or storing it themselves. In this session, the AWS Open Data Team shares tips and tricks, patterns and anti-patterns, and tools to help you effectively stage your data for analysis in the cloud.
The web has changed! Users spend more time on mobile than on desktops and they expect to have an amazing user experience on both platforms. APIs are the heart of the new web as the central point of access data, encapsulating logic and providing the same data and same features for desktops and mobiles.
In this talk, I will show you how in only 45 minutes we can create full REST API, with documentation and admin application build with React.
The document discusses adapting the open source Nutch search engine to enable full-text search of web archive collections. Key points include:
1. Nutch was selected as the search platform and modified to index content from web archive collections rather than live web crawling.
2. The modified Nutch supports two modes - basic search similar to Google, and a Wayback Machine-like interface to return all versions of a page.
3. Indexing statistics are provided for a small test collection, taking around 40 hours to index 1.07 million documents from 37GB of archive data.
This document outlines an approach to making web links more robust and interoperable for machines called "Robust Links". It discusses problems with current links like link rot and content drift. The proposed solution involves taking snapshots of linked resources in web archives, and decorating links with metadata about the archived snapshot, original URI, and timestamp. This allows both humans and machines to access the archived versions of linked resources even if the original link breaks. The presentation advocates adopting HTTP link headers and relation types to better connect related scholarly resources on the web in a machine-readable way.
Telling the World and Our Users What We HaveRichard Wallis
ย
This document summarizes a presentation by Richard Wallis on discovery and discoverability. It introduces Schema.org as a vocabulary for structured data on the web and its use by major organizations like Google, OCLC, and the Library of Congress. It discusses motivations for sharing bibliographic data on the web using Schema.org, including connecting library data and reaching users. Key initiatives are summarized, such as the Schema Bib Extend community group, BiblioGraph.net extension vocabulary, and the bib.schema.org hosted extension.
Log ingestion kafka -- impala using apexApache Apex
ย
This document discusses using Apache Apex to ingest log data from Kafka into Impala for fraud analysis. It describes using Kafka to stream JSON records from web servers into Parquet files on HDFS. These files are then queried using Impala for reporting. The document outlines the use case specifications, the initial DAG design using various operators, and an improved design to separate the active and passive directories for writing and querying. It compares batch and streaming approaches and how Apex provides built-in checkpointing and failure recovery.
This document summarizes a presentation on the Elastic Stack. It discusses the main components - Elasticsearch for storing and searching data, Logstash for ingesting data, Kibana for visualizing data. It provides examples of using Elasticsearch for search, analytics, and aggregations. It also briefly mentions new features across the Elastic Stack like update by query, ingest nodes, pipeline improvements, and APIs for management and metrics.
This document discusses various strategies and resources for archiving internet content for research purposes. It describes several existing large-scale web archives like the Internet Archive and Common Crawl, as well as national and institutional archives. It also outlines how researchers can collect targeted web archives using open-source tools or subscription-based services.
This document discusses a hackathon focused on using open agricultural data and APIs to help researchers and trainers. It describes challenges around discovering relevant resources and preparing training materials. Various data sources and APIs are presented, including those that provide search over aggregated metadata from multiple sources and harvest metadata via OAI-PMH. Services are proposed to index web resources with an agricultural thesaurus, crawl the web to discover related resources, and interlink bibliographic records with web content. The goal is to better connect users with relevant information through these data and technologies.
The document discusses facilitating the discovery of public datasets. It describes Schema.org, a collaborative project to add metadata to content using microdata, RDFa or JSON-LD formats. It also discusses challenges in identifying and relating datasets, as well as properties for describing datasets, such as name, description, URL, version, and spatial/temporal coverage. An example is given of markup for a seismic hazard zones dataset using these properties.
- The document discusses analysis of web archive data stored at the Internet Archive using tools like Apache Hadoop, Pig, Hive, Giraph and Mahout.
- It describes generating derivatives from crawled WARC files like CDX, parsed text and WAT, and storing them in HDFS for analysis using SQL-like queries.
- Various analyses are discussed including growth of content, duplication rates, breakdown by year, text analysis using TF-IDF, and link analysis to generate graphs and compute metrics like PageRank over time to understand the archived web.
An introductory tutorial for the web framework Angular with companion demo github repository; and a step by step github tutorial repository. Presented at Northwestern WildHacks May 17, 2017
Web archiving challenges and opportunitiesAhmed AlSum
ย
The document discusses challenges and opportunities in web archiving. It outlines the key stages in the web archiving lifecycle including selection of content, harvesting techniques, storage formats and infrastructure, ways to provide access, and the role of community. Specific challenges are discussed such as representing dynamic and social media content, optimizing storage solutions, and addressing limitations of current access interfaces. Opportunities exist in focusing collection efforts on underrepresented regions, leveraging existing archived data, and developing innovative services and tools to support researchers.
Linked Data (1st Linked Data Meetup Malmรถ)Anja Jentzsch
ย
This document discusses Linked Data and outlines its key principles and benefits. It describes how Linked Data extends the traditional web by creating a single global data space using RDF to publish structured data on the web and by setting links between data items from different sources. The document outlines the growth of Linked Data on the web, with over 31 billion triples from 295 datasets as of 2011. It provides examples of large Linked Data sources like DBpedia and discusses best practices for publishing, consuming, and working with Linked Data.
Polyglot persistence is about using multiple databases in concert with one another as part of a larger datastore ecosystem. The advantage is that your database layer uses a set of specialized tools to deliver overall value and functionality while simplifying data modeling by separating command and query responsibilities. The arrival of MongoDB and itโs flexible schemas further increases the possibilities of polyglot architectures.
Aggregating Private and Public Web Archives Using the Mementity FrameworkMat Kelly
ย
This document outlines Mat Kelly's PhD dissertation defense. The defense will address aggregating private and public web archives using the Mementity framework. Kelly will defend his dissertation to a committee chaired by Michele Weigle on May 7, 2019. The dissertation addresses challenges around capturing and replaying private content from the web, including content behind authentication or that requires special handling when aggregated. It proposes research questions around difficult to archive content types, comparing browser and crawler capabilities, issues with authenticated content, signaling content that needs special handling, and access controls for private archives.
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesMat Kelly
ย
Mat Kelly presented a framework for aggregating personal, private, and institutional web archives while maintaining access control. The framework includes separate timemaps for different types of captures that could be aggregated while restricting access to private captures. Kelly sought input on use cases around access control for private web archives and mechanisms for protecting archived web pages. The presentation explored challenges in replaying private archives alongside public ones from institutions and how the framework could address these issues.
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...Mat Kelly
ย
This document proposes a framework for aggregating private and public web archives. It introduces two new entities: the Memento Meta Aggregator (MMA) and the Private Web Archive Adapter (PWAA). The MMA allows for dynamic inclusion of archives and recursive construction of archive sets. The PWAA regulates access to private web archives by authenticating requests and relaying results. This framework enables private archives to be included in aggregations while preserving privacy through access control and authentication via the PWAA.
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Mat Kelly
ย
The document describes a system for generating thumbnail summaries of large collections of web archive mementos. The system uses SimHash to identify sufficiently unique mementos based on similarities and differences in HTML markup. It calculates Hamming distance between memento SimHashes to select a subset for the summary that limits redundancy while preserving important captures. The visualizations generated by the system provide an overview of a website's evolution over time using 3-6 representative thumbnails.
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryMat Kelly
ย
The document proposes using ResourceSync, BitTorrent, and WebRTC to facilitate the a posteriori replication of satellite imagery published on NASA web servers. It describes using a crawler to discover imagery resources and produce metadata, which is then used by adapter software to invoke a BitTorrent-based distribution of image payloads to users. The approach was constructed as a proof-of-concept to distribute data and mitigate reliance on NASA servers as the single source. Evaluation showed it was effective but temporally expensive, and future work could better integrate ResourceSync and utilize the YAML metadata.
This document introduces the Archival Acid Test, which evaluates how well web archiving tools archive modern webpages that use advanced HTML, JavaScript, and other web technologies. The test is divided into basic tests, JavaScript tests, and advanced features tests to assess different areas. Results show that archiving tools perform well on basic tests but struggle with dynamic content, asynchronous JavaScript, iframes, and other complex features. The goal of the Archival Acid Test is to create a standardized, publicly available way to evaluate how completely archiving tools archive modern webpages and identify areas for improvement.
The document provides an overview of browser-based digital preservation including:
- The current state of digital preservation which relies on web crawlers and archives like the Internet Archive. However, this approach is insufficient for preserving pages that are not popular, behind authentication, or use complex JavaScript.
- The requirements for new software to directly capture and preserve web pages from within the browser in order to address the limitations of current archival approaches.
- A proposed system called "WARCreate" that would leverage the Chrome extension API to capture web pages and resources and generate WARC files for preservation while maintaining the original browsing context.
Archive What I See Now - Archive-It Partner Meeting 2013 2013Mat Kelly
ย
This document summarizes a presentation about enabling individual web archiving. It discusses tools like WARCreate and WAIL that allow users to archive web pages from their browser in WARC format. Issues addressed include timely capture of breaking news, preserving original context like user profiles, and uploading personal archives to institutional archives. Goals of the Archive What I See Now project are to port WARCreate to Firefox, add capabilities to upload WARCs, and implement sequential archiving of linked resources.
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemMat Kelly
ย
This document describes a graph-based visualization system for navigating and predicting box office performance. The system represents movie data as interconnected nodes in a graph layout. Selecting different nodes allows navigation between the movie context and related contexts like actors. Node size and position encode attributes relevant to box office predictions. The system preprocesses and caches external data to make complex predictions accessible through an interactive visual interface.
The document introduces WARCreate and WAIL, tools that make web archiving easier. WARCreate allows users to archive web pages they see in their browser directly as WARC files, preserving context. WAIL packages existing tools like Heritrix and Wayback into a graphical user interface, allowing one-click archiving. Together these tools aim to make web archiving more accessible to personal archivists while still producing outputs compatible with institutional tools and standards.
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMat Kelly
ย
The document describes a set of tools that make enterprise-level web archiving accessible for personal use. The tools include a crawler (Heritrix), a web archive player (Wayback Machine), and an archive inspector (WARC-Proxy) that are installed locally on a personal machine. The interface provides one-click options to set up crawls, view archived pages in the local Wayback installation, and check archive status. It aims to support personal web archiving through a graphical user interface that allows customizing crawls, starting/stopping services, and works with existing WARC files from other tools on Windows, MacOS, and Linux systems.
An Extensible Framework for Creating Personal Web Archives of Content Behind ...Mat Kelly
ย
The document is a thesis that aims to develop an extensible framework for creating personal web archives of content behind authentication barriers. It discusses problems with current tools for personal web archiving, such as them breaking when sites change hierarchies and producing suboptimal archives. The thesis seeks to remedy such issues, preserve more social media content, and make archiving outputs more optimal. It utilizes tools like Archive Facebook and WARCreate to generate navigable archives in a format compatible with replay systems like Wayback Machine.
If twitter is the "first draft of history", then we should be doing a better job of preserving it. For the one
year anniversary of the Egyptian revolution (2012) we revisited a sample of the shared social media content and found nearly 11% missing from the current web, and only 20% available in public web archives. Spurred by this, we sampled tweets for ve other culturally important events from 2009-2012 and found similar rates for archiving and loss.
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageMat Kelly
ย
The Internet Archive's Wayback Machine is the most common way that typical users interact with web archives. The Internet Archive uses the Heritrix web crawler to transform pages on the publicly available web into Web ARChive (WARC) files, which can then be accessed using the Wayback Machine. Because Heritrix can only access the publicly available web, many personal pages (e.g., password-protected pages, social media pages) cannot be easily archived
into the standard WARC format. We have created a Google
Chrome extension,WARCreate, that allows a user to create
a WARC file from any webpage. Using this tool, content that might have been otherwise lost in time can be archived in a standard format by any user. This tool provides a way for casual users to easily create archives of personal online
content. This is one of the first steps in resolving issues of
long term storage, maintenance, and access of personal digital assets that have emotional, intellectual, and historical
value to individuals
NDIIPP/NDSA 2011 - YouTube Link RestorationMat Kelly
ย
Creating Persistent Links to YouTube Music Videos
The document discusses the problem of links to YouTube videos becoming invalid when videos are removed. It proposes introducing a resolver service that redirects links to alternative copies of videos when the original link returns a 404 error. This service would also retrieve and publish metadata about videos to external websites to help find available copies when the initial link is broken. The goal is to create persistent links to YouTube music videos even if the specific video is removed from YouTube.
Archive Facebook is an add-on for Mozilla Firefox that allows users to create stand-alone archives of the content on their Facebook account. It preserves the look and feel of Facebook, unlike Facebook's native downloading option. The add-on lets users choose what specific types of content to archive, rather than limiting it to what Facebook allows. This ensures the archive is a true snapshot of the user's Facebook data and history. The add-on provides an easy-to-use interface to navigate and access archived content.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
ย
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
ย
(๐๐๐ ๐๐๐) (๐๐๐ฌ๐ฌ๐จ๐ง ๐)-๐๐ซ๐๐ฅ๐ข๐ฆ๐ฌ
๐๐ข๐ฌ๐๐ฎ๐ฌ๐ฌ ๐ญ๐ก๐ ๐๐๐ ๐๐ฎ๐ซ๐ซ๐ข๐๐ฎ๐ฅ๐ฎ๐ฆ ๐ข๐ง ๐ญ๐ก๐ ๐๐ก๐ข๐ฅ๐ข๐ฉ๐ฉ๐ข๐ง๐๐ฌ:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
๐๐ฑ๐ฉ๐ฅ๐๐ข๐ง ๐ญ๐ก๐ ๐๐๐ญ๐ฎ๐ซ๐ ๐๐ง๐ ๐๐๐จ๐ฉ๐ ๐จ๐ ๐๐ง ๐๐ง๐ญ๐ซ๐๐ฉ๐ซ๐๐ง๐๐ฎ๐ซ:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
ย
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
ย
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
ย
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
How to Manage Reception Report in Odoo 17Celine George
ย
A business may deal with both sales and purchases occasionally. They buy things from vendors and then sell them to their customers. Such dealings can be confusing at times. Because multiple clients may inquire about the same product at the same time, after purchasing those products, customers must be assigned to them. Odoo has a tool called Reception Report that can be used to complete this assignment. By enabling this, a reception report comes automatically after confirming a receipt, from which we can assign products to orders.
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
ย
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the bodyโs response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Client-Assisted Memento Aggregation Using the Prefer Header
1. Client-Assisted Memento Aggregation
Using the Prefer Header
Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group
{mkelly, salam, mln, mweigle}@cs.odu.edu
@machawk1 โข @WebSciDL
Web Archiving and Digital Libraries (WADL) Workshop
June 6, 2018, Fort Worth, TX
3. @machawk1
A Framework for Aggregating Private and Public Web Archives
JCDL 2018 โข June 5, 2018 โข Fort Worth, TX
Todayโs Memento Aggregation
3
Archives Queried (A0 )
4. @machawk1
A Framework for Aggregating Private and Public Web Archives
JCDL 2018 โข June 5, 2018 โข Fort Worth, TX
Motivation
4
Archives Queried (A0 )
> Include personal archives
> Include other non-aggregated archives
5. @machawk1
A Framework for Aggregating Private and Public Web Archives
JCDL 2018 โข June 5, 2018 โข Fort Worth, TX
Motivation
5
Archives Queried (A0 )
> Include personal archives
> Include other non-aggregated archives
6. @machawk1
Client-Assisted Memento Aggregation Using the Prefer Header
WADL 2018 โข June 6, 2018 โข Fort Worth, TX
State of Aggregatorsโ Capabilities
โ Mementoweb aggregator
โ Cannot customize set of archives aggregated
โ Open source? Unavailable for individualsโ deployment
โ MemGator
โ Open source โ https://github.com/oduwsdl/MemGator
โ Requires static set of archives on-launch
โ Still specified by server, clients have no say
โ With each, the set of archives is determined on the โserverโ.
โ Neither allows client to specify set of archives aggregated.
6
7. @machawk1
Client-Assisted Memento Aggregation Using the Prefer Header
WADL 2018 โข June 6, 2018 โข Fort Worth, TX
HTTP Prefer
โ RFC 7240 (June 2014)
โ CLIENT requests with HTTP Header:
โ Prefer: foo; bar=""
โ SERVER may response with HTTP Header:
โ Preference-Applied: foo
7
8. @machawk1
Client-Assisted Memento Aggregation Using the Prefer Header
WADL 2018 โข June 6, 2018 โข Fort Worth, TX
HTTP Prefer
โ RFC 7240 (June 2014)
โ CLIENT requests with HTTP Header:
โ Prefer: foo; bar=""
โ SERVER may response with HTTP Header:
โ Preference-Applied: foo
Prefer: archives="data:application/json;charset=utf-8;base64,Ww0KIC7...NCn0="
OUR APPROACH:
8
9. @machawk1
Client-Assisted Memento Aggregation Using the Prefer Header
WADL 2018 โข June 6, 2018 โข Fort Worth, TX
Prefer + Memento
โ S. Jones, H. Van de Sompel, et al. โMementos in the Rawโ 1
โ Prefer: original-content, original-links, original headers
โ Mitigate replay system rewriting, may โrawโ information more accessible
โ D.S.H. Rosenthal โContent negotiation and Mementoโ 2
โ none, screenshot, altered-dom, url-rewritten, banner-inserted
โ Additional focus on derived representations
9
1 http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html
2 https://blog.dshr.org/2016/08/content-negotiation-and-memento.html
11. @machawk1
Client-Assisted Memento Aggregation Using the Prefer Header
WADL 2018 โข June 6, 2018 โข Fort Worth, TX
Memento Meta-Aggregator (MMA)1
โ Additional responsibilities beyond aggregation
โ Provide hierarchical querying model to other aggregators
โ Advanced querying models like Precedence and Short-Circuiting
โ Systematic interaction and aggregation with Private and Personal Web
archive
1 Kelly et al. โA Framework for Aggregator Private and Public Web Archivesโ, JCDL 2018
11
21. @machawk1
Client-Assisted Memento Aggregation Using the Prefer Header
WADL 2018 โข June 6, 2018 โข Fort Worth, TX
Potential Approaches Toward Archival Set
Persistence for Subsequent Queries
1. Maintain state
โ content-location: /timemap/link/5bd...8e9/http://fox.cs.vt.edu/wadl2017.html
โ Not something we want to do with HTTP
2. Require re-specification with each request
โ not portable to other users
3. Server-side set caching
โ combinatorial explosion
21
22. Client-Assisted Memento Aggregation
Using the Prefer Header
Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group
{mkelly, salam, mln, mweigle}@cs.odu.edu
@machawk1 โข @WebSciDL
Web Archiving and Digital Libraries (WADL) Workshop
June 6, 2018, Fort Worth, TX