This document outlines topics that will be covered in a presentation about digital image collections and their reuse. Some of the key topics mentioned include: an overview of existing platforms for hosting image collections; blurring boundaries between data providers, service providers, and end users; tools and workflows for managing image collections; standards like IIIF for sharing collection data; metadata issues; and machine learning applications. The presentation will provide examples of current projects and discuss challenges around topics like ensuring proper understanding of object scale in digital interfaces.
This document describes Jean-Paul Calbimonte's doctoral research on enabling semantic integration of streaming data sources. The research aims to provide semantic query interfaces for streaming data, expose streaming data for the semantic web, and integrate streaming sources through ontology mappings. The approach involves ontology-based data access to streams, a semantic streaming query language, and semantic integration of distributed streams. Work done so far includes defining a language (SPARQLSTR) for querying RDF streams and enabling an engine to support streaming data sources through ontology mappings. Future work involves query optimization and quantitative evaluation.
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
The document describes a system for faceted navigation of multimedia content using semantic web technologies. It discusses using ontologies expressed in RDF(S) and OWL to represent metadata, BBC rush footage used as a case study, and visual facets for color, texture and combinations that were generated through MPEG-7 feature extraction and self-organizing map clustering. The system allows retrieval of clips and shots based on textual and visual facet filtering of the RDF represented multimedia data.
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Nikolaos Konstantinou
The Linked Open Data (LOD) movement is constantly gaining worldwide acceptance. In this paper we describe how LOD is generated in the case of digital repositories that contain bibliographic information, adopting international standards. The available options and respective choices are presented and justified while we also provide a technical description, the methodology we followed, the possibilities and difficulties in the way, and the respective benefits and drawbacks. Detailed examples are provided regarding the implementation and query capabilities, and the paper concludes after a discussion over the results and the challenges associated with our approach, and our most important observations and future plans.
This document provides an overview of the SPIDAL Dibbs project which aims to develop middleware and high performance analytics libraries for scalable data science. The project involves multiple universities and focuses on developing tools like HPC-ABDS to enable interoperability between high performance computing and Apache big data stack technologies. It also involves developing applications in various domains, the SPIDAL library for scalable analytics, and benchmarks to evaluate performance.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
Networked Digital Library Of Theses And Dissertationssinglish
The Networked Digital Library of Theses and Dissertations (NDLTD) is an international organization that promotes the electronic publishing and preservation of graduate theses and dissertations. NDLTD allows students to create electronic documents, increases access to student research, and supports long-term preservation of electronic theses and dissertations (ETDs). It currently holds over 767,000 ETDs from 90 institutions in 18 countries in 17 formats, with metadata described by the ETD-MS standard to facilitate searching and discovery.
The document discusses scaling web data at low cost. It begins by presenting Javier D. Fernández and providing context about his work in semantic web, open data, big data management, and databases. It then discusses techniques for compressing and querying large RDF datasets at low cost using binary RDF formats like HDT. Examples of applications using these techniques include compressing and sharing datasets, fast SPARQL querying, and embedding systems. It also discusses efforts to enable web-scale querying through projects like LOD-a-lot that integrate billions of triples for federated querying.
This document describes Jean-Paul Calbimonte's doctoral research on enabling semantic integration of streaming data sources. The research aims to provide semantic query interfaces for streaming data, expose streaming data for the semantic web, and integrate streaming sources through ontology mappings. The approach involves ontology-based data access to streams, a semantic streaming query language, and semantic integration of distributed streams. Work done so far includes defining a language (SPARQLSTR) for querying RDF streams and enabling an engine to support streaming data sources through ontology mappings. Future work involves query optimization and quantitative evaluation.
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
The document describes a system for faceted navigation of multimedia content using semantic web technologies. It discusses using ontologies expressed in RDF(S) and OWL to represent metadata, BBC rush footage used as a case study, and visual facets for color, texture and combinations that were generated through MPEG-7 feature extraction and self-organizing map clustering. The system allows retrieval of clips and shots based on textual and visual facet filtering of the RDF represented multimedia data.
Exposing Bibliographic Information as Linked Open Data using Standards-based ...Nikolaos Konstantinou
The Linked Open Data (LOD) movement is constantly gaining worldwide acceptance. In this paper we describe how LOD is generated in the case of digital repositories that contain bibliographic information, adopting international standards. The available options and respective choices are presented and justified while we also provide a technical description, the methodology we followed, the possibilities and difficulties in the way, and the respective benefits and drawbacks. Detailed examples are provided regarding the implementation and query capabilities, and the paper concludes after a discussion over the results and the challenges associated with our approach, and our most important observations and future plans.
This document provides an overview of the SPIDAL Dibbs project which aims to develop middleware and high performance analytics libraries for scalable data science. The project involves multiple universities and focuses on developing tools like HPC-ABDS to enable interoperability between high performance computing and Apache big data stack technologies. It also involves developing applications in various domains, the SPIDAL library for scalable analytics, and benchmarks to evaluate performance.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
Networked Digital Library Of Theses And Dissertationssinglish
The Networked Digital Library of Theses and Dissertations (NDLTD) is an international organization that promotes the electronic publishing and preservation of graduate theses and dissertations. NDLTD allows students to create electronic documents, increases access to student research, and supports long-term preservation of electronic theses and dissertations (ETDs). It currently holds over 767,000 ETDs from 90 institutions in 18 countries in 17 formats, with metadata described by the ETD-MS standard to facilitate searching and discovery.
The document discusses scaling web data at low cost. It begins by presenting Javier D. Fernández and providing context about his work in semantic web, open data, big data management, and databases. It then discusses techniques for compressing and querying large RDF datasets at low cost using binary RDF formats like HDT. Examples of applications using these techniques include compressing and sharing datasets, fast SPARQL querying, and embedding systems. It also discusses efforts to enable web-scale querying through projects like LOD-a-lot that integrate billions of triples for federated querying.
This document discusses RDF stream processing and the role of semantics. It begins by outlining common sources of streaming data on the internet of things. It then discusses challenges of querying streaming data and existing approaches like CQL. Existing RDF stream processing systems are classified based on their query capabilities and use of time windows and reasoning. The role of linked data principles and HTTP URIs for representing streaming sensor data is discussed. Finally, requirements for reactive stream processing systems are outlined, including keeping data moving, integrating stored and streaming data, and responding instantaneously. The document argues that building relevant RDF stream processing systems requires going beyond existing requirements to address data heterogeneity, stream reasoning, and optimization.
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Jean-Paul Calbimonte
This document outlines an approach for enabling ontology-based access to streaming sensor data sources. It describes mapping streaming data to RDF using ontologies and enabling SPARQL queries over streaming RDF data. It provides examples of mapping sensor data streams and relational databases to ontologies using R2RML mappings. It also demonstrates executing continuous SPARQL queries over integrated streaming data sources using the Semantic Integrator system.
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Franck Michel
The document discusses metadata standards for describing datasets on the web in order to make them FAIR (Findable, Accessible, Interoperable, and Reusable). It describes common metadata vocabularies like DCAT, VOID, and Schema.org that can be used to provide metadata about datasets and catalogs. It also discusses application profiles that extend these vocabularies for specific domains, like DCAT-AP for European data portals and the HCLS profile for health and life sciences datasets. Finally, it briefly introduces data portal platforms like CKAN that can be used to publish and search datasets and their metadata on the web.
OpenStreetMap is an open/collaborative map considered the Wikipedia of maps. There are tons of information available online and even books about it… anyway, not the history of OSM we will discuss here, but rather we will take a look on the impressive dataset behind and how such a big map can be analysed with modern technologies as Apache Spark.
This document proposes an approach to enable ontology-based access to streaming data sources. It discusses mapping streaming data schemas to ontological concepts and extending SPARQL to support querying streaming RDF data. This would allow expressing continuous queries over streaming data using ontological terms. The approach includes translating such SPARQL queries to queries over streaming data sources using mappings between the ontology and streaming schemas. An implementation of a semantic integration service is proposed to deploy this ontology-based access to streaming data.
Access to knowledge is key to combating hunger and poverty, but complex information needs for agricultural development cannot be met by availability alone. Linked data principles and vocabularies can help integrate disconnected repositories and allow users to discover related information across databases. The CIARD RING aims to apply these approaches to provide value-added services that help users find information on technologies, countries, crops, and projects.
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
This document discusses using Apache Spark and ADAM to perform scalable genomic analysis. It provides an overview of genomics and challenges with existing approaches. ADAM uses Apache Spark and Parquet to efficiently store and query large genomic datasets. The document demonstrates clustering genomic data from the 1000 Genomes Project to predict populations, showing ADAM and Spark can handle large genomic workloads. It concludes these tools provide scalable genomic data processing but future work is needed to implement more advanced algorithms.
ARIADNE is an EU-funded project that aims to integrate archaeological data repositories across Europe by overcoming fragmentation and fostering data sharing. It involves 24 partners from 17 countries. The project conducts networking activities to build community and standards, provides trans-national access to online resources and training, and performs research on data integration, management, and new tools. In its first nine months, ARIADNE has established special interest groups, collected information on partners' datasets and metadata schemas, and begun designing an integrated infrastructure and catalog data model.
The talk about "Stream Reasoning" for INQUEST -- INnovative QUErying of STreams 2012 -- (http://games.cs.ox.ac.uk/inquest12/) organized in Oxford, United Kingdom, September 25-27 2012.
The talks presents a comprehensive view on "Stream Reasoning" -- reasoning on rapidly flowing information. It illustrates the challenges, presents the achievements of the database group of Politecnico di Milano on the topic, reviews the challenges pointing to results and ongoing work in the Semantic Web community and proposes how to go beyond the current Stream Reasoning concept. It particular, it points out that "orders matters" when processing massive data and it proposes to investigate streaming algorithms for automated reasoning that can be applied not only to data streams that are "naturally" ordered (by recency) but to any sortable data source.
Transient and persistent RDF views over relational databases in the context o...Nikolaos Konstantinou
As far as digital repositories are concerned, numerous benefits emerge from the disposal of their contents as Linked Open Data (LOD). This leads more and more repositories towards this direction. However, several factors need to be taken into account in doing so, among which is whether the transition needs to be materialized in real-time or in asynchronous time intervals. In this paper we provide the problem framework in the context of digital repositories, we discuss the benefits and drawbacks of both approaches and draw our conclusions after evaluating a set of performance measurements. Overall, we argue that in contexts with infrequent data updates, as is the case with digital repositories, persistent RDF views are more efficient than real-time SPARQL-to-SQL rewriting systems in terms of query response times, especially when expensive SQL queries are involved.
Wakanda: Integrated Web Application Development with NoSQL and JavaScriptJuergen Fesslmeier
This is a slide presentation about Web Application development using Wakanda in pure JavaScript. This presentation happened at JSConf Argentina on May 19, 2012. For additional resources, please visit http://wakanda.org and http://jsconf.com.ar
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com
In this video from the 2014 HPC Advisory Council Europe Conference, DK Panda from Ohio State University presents: MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans.
This talk will focus on latest developments and future plans for the MVAPICH2 and MVAPICH2-X projects. For the MVAPICH2 project, we will focus on scalable and highly-optimized designs for pt-to-pt communication (two-sided and one-sided MPI-3 RMA), collective communication (blocking and MPI-3 non-blocking), support for GPGPUs and Intel MIC, support for MPI-T interface and schemes for fault-tolerance/fault-resilience. For the MVAPICH2-X project, will focus on efficient support for hybrid MPI and PGAS (UPC and OpenSHMEM) programming model with unified runtime."
Watch the video presentation: http://wp.me/p3RLHQ-coF
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
The document discusses a project called "SPARQL Anything" which aims to simplify knowledge graph construction by using SPARQL as the single language for representing and transforming diverse data formats into RDF. It presents an approach called "Facade-X" which defines a common RDF structure that can be applied over different formats like CSV, JSON, HTML, etc. This facade focuses on the RDF meta-model and aims to apply minimal ontological commitments. The document outlines how Facade-X can be used to represent different formats and provides examples of using SPARQL to transform sample data into RDF without committing to a specific domain ontology.
This talk introduces the concepts of web 3.0 technology and how they relate to related technologies such as Internet of Things (IoT), Grid Computing and the Semantic Web:
• A short history of web technologies:
o Web 1.0: Publishing static information with links for human consumption.
o Web 2.0: Publishing dynamic information created by users, for human consumption.
o Web 3.0: Publishing all kinds of information with links between data items, for machine consumption.
• Standardization of protocols for description of any type of data (RDF, N3, Turtle).
• Standardization of protocols for the consumption of data in “the grid” (SPARQL).
• Standardization of protocols for rules (RIF).
• Comparison with the evolution of technologies related to data bases.
• Comparison of IoT solutions based on web 2.0 and web 3.0 technologies.
• Distributed solutions vs centralized solutions..
• Security
• Extensions of Peer-to-peer protocols (XMPP).
• Advantages of solutions based on web 3.0 and standards (IETF, XSF).
Duration of talk: 1-2 hours with questions.
This document discusses RDF stream processing and the role of semantics. It begins by outlining common sources of streaming data on the internet of things. It then discusses challenges of querying streaming data and existing approaches like CQL. Existing RDF stream processing systems are classified based on their query capabilities and use of time windows and reasoning. The role of linked data principles and HTTP URIs for representing streaming sensor data is discussed. Finally, requirements for reactive stream processing systems are outlined, including keeping data moving, integrating stored and streaming data, and responding instantaneously. The document argues that building relevant RDF stream processing systems requires going beyond existing requirements to address data heterogeneity, stream reasoning, and optimization.
Tutorial ESWC2011 Building Semantic Sensor Web - 04 - Querying_semantic_strea...Jean-Paul Calbimonte
This document outlines an approach for enabling ontology-based access to streaming sensor data sources. It describes mapping streaming data to RDF using ontologies and enabling SPARQL queries over streaming RDF data. It provides examples of mapping sensor data streams and relational databases to ontologies using R2RML mappings. It also demonstrates executing continuous SPARQL queries over integrated streaming data sources using the Semantic Integrator system.
Describe and Publish data sets on the web: vocabularies, catalogues, data por...Franck Michel
The document discusses metadata standards for describing datasets on the web in order to make them FAIR (Findable, Accessible, Interoperable, and Reusable). It describes common metadata vocabularies like DCAT, VOID, and Schema.org that can be used to provide metadata about datasets and catalogs. It also discusses application profiles that extend these vocabularies for specific domains, like DCAT-AP for European data portals and the HCLS profile for health and life sciences datasets. Finally, it briefly introduces data portal platforms like CKAN that can be used to publish and search datasets and their metadata on the web.
OpenStreetMap is an open/collaborative map considered the Wikipedia of maps. There are tons of information available online and even books about it… anyway, not the history of OSM we will discuss here, but rather we will take a look on the impressive dataset behind and how such a big map can be analysed with modern technologies as Apache Spark.
This document proposes an approach to enable ontology-based access to streaming data sources. It discusses mapping streaming data schemas to ontological concepts and extending SPARQL to support querying streaming RDF data. This would allow expressing continuous queries over streaming data using ontological terms. The approach includes translating such SPARQL queries to queries over streaming data sources using mappings between the ontology and streaming schemas. An implementation of a semantic integration service is proposed to deploy this ontology-based access to streaming data.
Access to knowledge is key to combating hunger and poverty, but complex information needs for agricultural development cannot be met by availability alone. Linked data principles and vocabularies can help integrate disconnected repositories and allow users to discover related information across databases. The CIARD RING aims to apply these approaches to provide value-added services that help users find information on technologies, countries, crops, and projects.
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
This document discusses using Apache Spark and ADAM to perform scalable genomic analysis. It provides an overview of genomics and challenges with existing approaches. ADAM uses Apache Spark and Parquet to efficiently store and query large genomic datasets. The document demonstrates clustering genomic data from the 1000 Genomes Project to predict populations, showing ADAM and Spark can handle large genomic workloads. It concludes these tools provide scalable genomic data processing but future work is needed to implement more advanced algorithms.
ARIADNE is an EU-funded project that aims to integrate archaeological data repositories across Europe by overcoming fragmentation and fostering data sharing. It involves 24 partners from 17 countries. The project conducts networking activities to build community and standards, provides trans-national access to online resources and training, and performs research on data integration, management, and new tools. In its first nine months, ARIADNE has established special interest groups, collected information on partners' datasets and metadata schemas, and begun designing an integrated infrastructure and catalog data model.
The talk about "Stream Reasoning" for INQUEST -- INnovative QUErying of STreams 2012 -- (http://games.cs.ox.ac.uk/inquest12/) organized in Oxford, United Kingdom, September 25-27 2012.
The talks presents a comprehensive view on "Stream Reasoning" -- reasoning on rapidly flowing information. It illustrates the challenges, presents the achievements of the database group of Politecnico di Milano on the topic, reviews the challenges pointing to results and ongoing work in the Semantic Web community and proposes how to go beyond the current Stream Reasoning concept. It particular, it points out that "orders matters" when processing massive data and it proposes to investigate streaming algorithms for automated reasoning that can be applied not only to data streams that are "naturally" ordered (by recency) but to any sortable data source.
Transient and persistent RDF views over relational databases in the context o...Nikolaos Konstantinou
As far as digital repositories are concerned, numerous benefits emerge from the disposal of their contents as Linked Open Data (LOD). This leads more and more repositories towards this direction. However, several factors need to be taken into account in doing so, among which is whether the transition needs to be materialized in real-time or in asynchronous time intervals. In this paper we provide the problem framework in the context of digital repositories, we discuss the benefits and drawbacks of both approaches and draw our conclusions after evaluating a set of performance measurements. Overall, we argue that in contexts with infrequent data updates, as is the case with digital repositories, persistent RDF views are more efficient than real-time SPARQL-to-SQL rewriting systems in terms of query response times, especially when expensive SQL queries are involved.
Wakanda: Integrated Web Application Development with NoSQL and JavaScriptJuergen Fesslmeier
This is a slide presentation about Web Application development using Wakanda in pure JavaScript. This presentation happened at JSConf Argentina on May 19, 2012. For additional resources, please visit http://wakanda.org and http://jsconf.com.ar
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com
In this video from the 2014 HPC Advisory Council Europe Conference, DK Panda from Ohio State University presents: MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans.
This talk will focus on latest developments and future plans for the MVAPICH2 and MVAPICH2-X projects. For the MVAPICH2 project, we will focus on scalable and highly-optimized designs for pt-to-pt communication (two-sided and one-sided MPI-3 RMA), collective communication (blocking and MPI-3 non-blocking), support for GPGPUs and Intel MIC, support for MPI-T interface and schemes for fault-tolerance/fault-resilience. For the MVAPICH2-X project, will focus on efficient support for hybrid MPI and PGAS (UPC and OpenSHMEM) programming model with unified runtime."
Watch the video presentation: http://wp.me/p3RLHQ-coF
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
The document discusses a project called "SPARQL Anything" which aims to simplify knowledge graph construction by using SPARQL as the single language for representing and transforming diverse data formats into RDF. It presents an approach called "Facade-X" which defines a common RDF structure that can be applied over different formats like CSV, JSON, HTML, etc. This facade focuses on the RDF meta-model and aims to apply minimal ontological commitments. The document outlines how Facade-X can be used to represent different formats and provides examples of using SPARQL to transform sample data into RDF without committing to a specific domain ontology.
This talk introduces the concepts of web 3.0 technology and how they relate to related technologies such as Internet of Things (IoT), Grid Computing and the Semantic Web:
• A short history of web technologies:
o Web 1.0: Publishing static information with links for human consumption.
o Web 2.0: Publishing dynamic information created by users, for human consumption.
o Web 3.0: Publishing all kinds of information with links between data items, for machine consumption.
• Standardization of protocols for description of any type of data (RDF, N3, Turtle).
• Standardization of protocols for the consumption of data in “the grid” (SPARQL).
• Standardization of protocols for rules (RIF).
• Comparison with the evolution of technologies related to data bases.
• Comparison of IoT solutions based on web 2.0 and web 3.0 technologies.
• Distributed solutions vs centralized solutions..
• Security
• Extensions of Peer-to-peer protocols (XMPP).
• Advantages of solutions based on web 3.0 and standards (IETF, XSF).
Duration of talk: 1-2 hours with questions.
(Brève) Introduction à la visualisation de données (en SHS)Antoine Courtin
Atelier réalisé dans le cadre de la journée "Kit de survie pour l'historien de l'art en milieu numérique" organisé à l'INHA le 23 septembre 2016.
L'objectif étant d'avoir une introduction aux principes de bases de la visualisation de données en SHS tout en réalisant un focus sur l'histoire de l'art avec notamment la manipulation de l'outil web Palladio.
Crodwsourcing dans les institutions culturelles: mise-à-jour pour l'année 2015Antoine Courtin
Il s'agit de quelques slides supplémentaires (une màj 2015 en quelque sorte) à la présentation initiale du cours réalisé en 2014 pour le Master d'archives à l'Université d'Angers
Analyzing Social Network Interaction in Cultural FieldAntoine Courtin
This document summarizes research analyzing social network interactions in cultural fields. It discusses methodology for collecting, analyzing, and visualizing Twitter data to qualify user engagement for cultural topics. The research collected tweets using hashtags related to a European museum event, then performed quantitative analysis of tweet volumes, types, and senders. Qualitative content analysis involved developing a machine learning classifier to categorize tweets, which achieved moderate performance. Further analysis of additional data and linking tweets to referenced content was proposed.
Support de présentation pour des démos dans le cadre de WeViz à l'Institut d'Urbanisme d'Ile-de-France (IUA)
Noms de outils: OpenRefine, Import.io, Palladio, App.Raw, R, gephi
Sources et ressources dans le domaine culturelleAntoine Courtin
3ème séance "Sources et ressources" du séminaire "Biens communs" du Master Recherche en Infocom à l'Université Paris 10 Nanterre.
Pour en savoir +: https://master-recherche-infocom.u-paris10.fr/
Brève introduction au Linked Open Data [appliqué aux institutions culturelles]Antoine Courtin
Présentation réalisé pour les M2 de l'école nationale des chartes en novembre 2014.
Ces slides remixent en partie des slides de Fabien Gandon, Thomas Francart, Gautier Poupeau, Emmanuelle Bermès.
Merci à eux.
Methodological Proposals for Designing Federative Platforms in Cultural Linke...Antoine Courtin
As part of the on-going Labex project "Past in the present", our proposal aims at highlighting the organizational issues of Linked Data projects that have to deal with pluri-institutional contexts, among which libraries. First, we will discuss what is at stake. Second, we will present a methodology based on the building of several diagrams which highlight technical, conceptual, and organizational obstacles. We will also address the issues of designing and producing an information system intended to ensure the transmission of scientific skills, the exploitation of major vocabularies, associated to specific vocabularies, by foreign institutions and the harmonizing or building of bridges between heterogeneous descriptions.
Données et institutions culturelles à l’heure de LinkedOpenData : quelles per...Antoine Courtin
Présentation pour les #JIES2014 (programme : http://jies-chamonix.org/?page_id=2335)
Photo de lettrines issues du compte FlickR du BritishMuseum (https://www.flickr.com/photos/britishlibrary/)
Présentation réalisée lors de la 1er séance du séminaire "Dépôt légal du Web" sur le thème "Web, archives et musées" organisé à l'Institu National de l'Audiovisuel
Grand Prix DataCulture du MCC: le projet LaderdesdersAntoine Courtin
Présentation du projet Laderdesders, lauréat du Grand Prix DataCulture lors du premier hackathon réalisé par le Ministère de la Culture et de la Communication.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
#DHNord2019 : Pour un regard à 360 degrés des corpus visuels : pratiques de mise à disposition et de réutilisation
1. Pour un regard à 360 degrés des
corpus visuels : pratiques de mise
à disposition et de réutilisation
Antoine Courtin
✤ responsable de la cellule d’ingénierie documentaire au département
des études et de la recherche à l’Institut national d’histoire de l’art
✤ maitre de conférence associé, Université Paris Nanterre
#DHNord2019 - Lille - 18 octobre 2019
4. Triple approche qui irriguerons cette intervention…
veilleur
fournisseur
/
réutilisateur
5. Triple approche qui irriguerons cette intervention…
veilleur
fournisseur
/
réutilisateur
GLAMs
&
HDA
6. Triple approche qui irriguerons cette intervention…
veilleur
fournisseur
/
réutilisateur
GLAMs
&
HDA
et parfois avec un peu de pub pour les actions en cours à l’INHA…
10. Ce que nous
(tenterons) d’aborder…
• Etat des lieux des plateformes
• Des frontières qui s’estompent : data provider /
services provider / usager(s) finaux
• L(a) ’(in)dépendance entre corpus et outils
• galaxie/panorama d’outils à toutes les étapes
d’un « corpus »
• Aller au-delà des interface : fourniture de corpus
via IIIF
• du BYOD au UYOS ?
• Le paradoxe « métadonnées » et corpus visuels
• qualité et structurations des métadonnées
descriptives
• Le machine Learning dans tout ça ? Fantasme
et réalité
• Réutilisations - corpus visuels et création artistique
11. Ce que nous
(tenterons) d’aborder…
• Etat des lieux des plateformes
• Des frontières qui s’estompent : data provider /
services provider / usager(s) finaux
• L(a) ’(in)dépendance entre corpus et outils
• galaxie/panorama d’outils à toutes les étapes
d’un « corpus »
• Aller au-delà des interface : fourniture de corpus
via IIIF
• du BYOD au UYOS ?
• Le paradoxe « métadonnées » et corpus visuels
• qualité et structurations des métadonnées
descriptives
• Le machine Learning dans tout ça ? Fantasme
et réalité
• Réutilisations - corpus visuels et création artistique
12. Wikipedia commons : le musée Saint-Raymond, Toulouse Flickr : The library of Congress, etc.
Internet Archive : bibliothèque Sainte-Geneviève, Paris
Gallica Marque Blanche : Bibliothèque nationale et
universitaire de Strasbourg MédiHAL : photothèque de l’UMR PRODIG
Google Art & Culture : le MAD, Paris
13. Collective Access : Musée d’Orsay, Paris
Omeka-s : INHA
Drupal : INHA
wordpress : IRHiS - UMR
CMS Mémo: Médiathèque de Valence Romans
Yoolib :Bibliothèque Mazarine
14. Wax (based on Jeckyl) : https://stylerevolution.github.io/ >https://minicomp.github.io/wiki/#/wax/
scalar : The Alliance for Networking Visual Culture
25. L’ère de l’annotation
http://www.aioli.cloud : annotation
d’objets patrimoniaux en 3D
instance Mirador dans la Toolforge de wikimedia
avec annotations spacialisées issues de wikidata
Pundit : annotation sémantique de corpus textuel
31. http://jennriley.com/metadatamap/ :
AGLS, APPM, DACS, EAC-CPF, EAD, GILS,
ISAAR(CPF), ISAD(G), RAD
DTD, LCC, LCSH, MARC,
MARCXML, METS, MIX, MODS,
OAI-PMH, OAIS, PB Core, PREMIS,
SGML, SRU, TGM I, TGM II, TGN,
XML, XML Schema, XPath, XQuery,
XSLT
AES Core
Audio, Atom,
CIDOC/CRM,
DC, DCAM,
FGDC/CSDGM,
FOAF, FRAD,
FRBR, FRSAD, ISO
19115, Linked Data,
OAI-ORE, QDC,
RDF, RELAX NG,
RSS, SKOS, TEI,
Topic Maps, VRA
Core, XOBIS
AACR2, AAT, ADL,
CanCore, CDWA,
CDWA Lite, DDC,
DwC, GEM,
IEEE/LOM, indecs,
ISBD, KML,
MADS, MESH,
METS Rights,
MPEG-7, ODRL,
RDA, SMIL,
TextMD, ULAN,
VSO Data
Model, XMP,
XrML, Z39.50
ADL, AES
Core Audio,
AES Process
History, Atom, BISAC,
DIF, DIG35, DTD, FOAF,
ID3, KML, Linked Data,
MathML, MO, MPEG-21 DIDL,
MPEG-7, MusicXML, MXF, NewsML,
OAIS, ODRL, ONIX, Ontology for Media
Resource, PRISM, RDF, RELAX NG, RSS,
SCORM, SKOS, SMIL, Topic Maps,
XML, XML Schema, XMP,
XPath, XQuery, XrML,
XSLT
AACR2, AGLS,
CQL, DDC, FRAD,
FRBR, FRSAD, GILS,
ISBD, LCC, LCSH,
MADS, MARC, MARC
Relator Codes, MARCXML,
MESH, METS, MIX, MODS,
OAI-PMH, OAIS, OpenURL,
PREMIS, RDA, Sears List of
Subject Headings, SRU, SWAP, TEI,
TextMD, TGM I, TGM II, VRA Core,
XML, XML Schema, XOBIS, XPath,
XSLT, Z39.50
AAT, CCO, CDWA, CDWA Lite, CIDOC/CRM,
MuseumDat, SPECTRUM, TGN, ULAN` DTD, OAI-PMH, VRA
Core, XML,
XMLSchema, XPath,
XQuery, XSLT
AES Core Audio, AES Process History, CanCore,
CCO, DC, DCAM, DTD, FGDC/CSDGM, GEM,
IEEE/LOM, MEI, METS Rights, OAI-ORE, PB
Core, QDC, RDF, SGML, TGN, XQuery
DC, DCAM, EML,
FGDC/CSDGM, GEM, GML,
IEEE/LOM, indecs, ISO 19115,
OAI-ORE, QDC, SGML, VSO Data
Model
GILS, MEI, MESH,
OAI-PMH, SWAP, TEI
AGLS, CanCore, CQL, DwC,
FRBR, LCSH, METS, MIX,
PREMIS, SRU
APPM, Atom, CDWA, CDWA Lite, CIDOC/CRM, DACS, DwC, EAC-CPF,
EAD, EML, FOAF, indecs, ISAAR(CPF), ISO 19115, Linked Data,
MPEG-21 DIDL, ONIX, RELAX NG, RSS, SKOS, Topic Maps, ULAN
AAT, ADL, DIF, ID3, ISAD(G), KML, MPEG-7, MusicXML, MXF, ODRL, RAD, SMIL, VSO Data Model, XMP, XRML
AACR2, AES Core
Audio, AES Process
History, APPM,
CanCore, DACS,
DDC, DwC,
EAC-CPF, EAD,
FGDC/CSDGM,
FRBR, GEM,
IEEE/LOM,
ISAAR(CPF), ISAD(G),
ISO 19115, KML, LCC,
LCSH, MADS, MARC
Relator Codes, MESH,
METS, METS Rights,
MPEG-7, ODRL, PB
Core, RAD, RDA,
RELAX NG, SMIL, SRU,
TEI, TextMD, XMP,
XOBIS, XrML, Z39.50
Atom, DC, DCAM,
FOAF, indecs,
Linked Data, MIX,
MODS, OAI-ORE,
OAIS, PREMIS,
QDC, RDF, RSS,
SGML, SKOS,
TGM I,
TGM II, Topic
Maps
Archives
Information
Industry
Libraries
Museums
Cultural Objects
Visual
Resources
Geospatial
Data
Moving
Images Musical
Materials
Scholarly
Texts
AAT, CCO,
CDWA, CDWA Lite,
CIDOC/CRM, DC, DTD, METS,
MIX, MPEG-21 DIDL, MuseumDat, OAI-PMH,
Ontology for Media Resource, QDC, SPECTRUM, TGN,
ULAN, VRA Core, XML, XML Schema, XPath, XSLT
APPM, DACS,
DCAM, EAC-CPF, indecs, Linked
Data, MADS, MARC Relator
Codes, METS Rights, MODS,
OAIS, PREMIS, RAD, RDF, RELAX
NG, SGML, SKOS, SRU, XQuery
Atom, DDC,
EAD, ISAAR(CPF),
ISAD(G), ISBD,
LCC, LCSH, MARC,
MARCXML,
OAI-ORE, ODRL, PB
Core, RDA, RSS,
SCORM, Sears List
of Subject Headings,
Topic Maps, XrML,
Z39.50
AGLS,
CanCore, FRBR,
GEM, IEEE/LOM,
MPEG-7, SMIL, TGM
I, TGM II, XOBIS
Strong
Sem
i-Strong
Sem
i-W
eak
Wea
k
Strong
Semi-Strong
Semi-Weak
Strong
Sem
i-Strong
Semi-Weak
Wea
k
DC,
DIF, DTD,
EML, METS,
MPEG-21 DIDL, OAIS,
QDC, VSO Data Model, XML,
XML Schema, XPath,
XSLT
AGLS,
DCAM, Linked
Data, METS Rights,
OAI-ORE, OAI-PMH,
ODRL, PREMIS, RDF,
RELAX NG, SGML, SKOS,
SRU, XQuery, XrML
Atom, DwC,
GILS, indecs,
MODS, RSS,
SCORM,
Topic Maps,
Z39.50
CanCore, DDC, EAC-CPF,
FRBR, GEM, IEEE/LOM,
ISAAR(CPF), ISBD, LCC,
MADS, MARC, MARC
Relator Codes,
MARCXML, MathML,
Ontology for Media
Resource, TGN, XMP,
XOBIS
DC, DTD,
FGDC/CSDGM,
GML, ISO
19115, KML,
OAIS, QDC,
TGN, XML, XML
Schema, XPath,
XSLT
AGLS, DCAM, EML, Linked Data,
METS, METS Rights, MPEG-21
DIDL, OAI-PMH, ODRL, PREMIS,
RDF, RELAX NG, SGML, SKOS,
SRU, XQuery, XrML
CanCore, DDC,
EAC-CPF, FRBR, GEM,
IEEE/LOM, ISAAR(CPF), ISBD,
LCC, LCSH, MADS, MARC, MARC Relator
Codes, MARCXML, Ontology for Media Resource,
Sears List of Subject Headings, XMP,
XOBIS
Datasets
DC, DTD, FRBR, LCSH,
METS, MPEG-21 DIDL,
MXF, Ontology for
Media Resource, PB
Core, QDC, XML,
XML Schema,
XPath, XSLT,
Z39.50
AACR2, CanCore, DCAM, DDC, GEM, IEEE/LOM,
indecs, ISBD, LCC, Linked Data, MADS, MARC,
MARC Relator Codes, MARCXML, METS
Rights, MODS, MPEG-7, MuseumDat,
NewsML, OAI-PMH, OAIS, ODRL, PREMIS,
RAD, RDA, RDF, RELAX NG, Sears List of
Subject Headings, SGML, SKOS, SMIL,
SRU, XMP, XOBIS, XQuery, XrML
AGLS, APPM, Atom, CIDOC/CRM,
DACS, EAC-CPF, EAD,
ISAAR(CPF), ISAD(G), OAI-ORE,
RSS, SCORM, TGN, Topic Maps
ADL, AES Core Audio,
AES Process History,
DC, DTD, FRBR, ID3,
LCSH, MEI, METS, MO,
MPEG-21 DIDL,
MusicXML, MXF,
Ontology for Media
Resource, PB Core,
QDC, XML, XML
Schema, XPath,
XSLT, Z39.50
AACR2, DCAM, DDC,
indecs, ISBD, LCC, Linked
Data, MADS, MARC, MARC
Relator Codes, MARCXML, METS
Rights, MODS, OAI-PMH, OAIS,
ODRL, PREMIS, RAD, RDA, RDF,
RELAX NG, Sears List of Subject
Headings, SGML, SKOS, SMIL, SRU,
XOBIS, XQuery, XrML
AGLS, APPM, Atom,
CIDOC/CRM, DACS, EAC-CPF, EAD,
ISAAR(CPF), ISAD(G), MPEG-7, OAI-ORE,
RSS, SCORM, Topic Maps
CanCore, GEM, IEEE/LOM, MIX,
MuseumDat, TGN, XMP
DC, DTD,
ISBD, LCSH, MESH,
METS, MPEG-21 DIDL,
OAI-ORE, OAI-PMH,
OAIS, ONIX, OpenURL,
QDC, SRU, SWAP, TEI,
TextMD, XML, XML
Schema, XPath,
XSLT, Z39.50
AACR2,
AGLS, Atom,
BISAC, DACS, DCAM,
DDC, FRBR, indecs, LCC,
Linked Data, MADS, MARC, MARC
Relator Codes, METS Rights, MODS,
PREMIS, PRISM, RDF, RELAX NG,
RSS, Sears List of Subject
Headings, SGML, SKOS, XMP,
XOBIS, XQuery, XrML
CanCore,
EAC-CPF, EAD, GEM,
IEEE/LOM, ISAAR(CPF),
ISAD(G), MARCXML, ODRL,
Ontology for Media
Resource, SCORM, TGN,
Topic Maps
MathML, MIX
AAT, CCO,
CDWA, CDWA Lite,
DC, DIG35, DTD, METS,
MIX, MPEG-21 DIDL, OAI-PMH,
OAIS, Ontology for Media Resource, PB
Core, QDC, SRU, TGM I, TGM II, TGN, ULAN,
VRA Core, XML, XML Schema, XPath, XSLT, Z39.50
AACR2, CanCore,
CIDOC/CRM, DCAM, GEM,
IEEE/LOM, indecs, ISBD, Linked Data,
MADS, MARC Relator Codes, METS
Rights, MODS, MPEG-7, MuseumDat,
NewsML, ODRL, PREMIS, RAD,
RDA, RDF, RELAX NG, SGML,
SKOS, SMIL, XMP, XOBIS,
XQuery, XrML
AGLS, APPM,
Atom, DACS,
EAC-CPF, EAD,
ISAAR(CPF),
ISAD(G), LCSH,
MARC,
MARCXML,
OAI-ORE, RSS,
SCORM, Sears
List of Subject
Headings, Topic
Maps
DDC, FRBR,
LCC
Domain Function
Purpose
Atom, DwC, GILS,
indecs, MODS, OAI-ORE,
RSS, SCORM, Topic Maps,
Z39.50
Seeing Standards:
Domain refers to the types of materials the standard is
intended to be used with or could potentially be useful for. The
specific categories represented here are not intended to be
exhaustive, nor are they mutually exclusive; rather, they are focused
on some common material types that are managed by cultural
heritage and other information organizations.
Cultural Objects refers to works of art, architecture, and
other creative endeavor.
Datasets refers to collections of primary data, largely
before interpretive activities have taken place. They may be
collected by scientific instruments, or through research
activities in the sciences, social sciences, humanities, or
other disciplines.
Geospatial Data refers to information relevant to
geographic location, either as the data about
geographic places themselves or the relationship of a
resource to a specific location.
Moving Images refers to resources expressed
as film, video, or digital moving images.
Musical Materials refers to resources
expressing music in any form, including as
audio, notation, and moving image.
Scholarly Texts refers to resources
produced as part of a research or scholastic
process, and includes both book-length and
article-length material.
Visual Resources refers to
material presented in fixed visual form.
These materials may be either artistic
or documentary in nature.
METS,
MPEG-21
DIDL, MXF,
SCORM
Atom, KML, MathML, RSSEML, MEI, MusicXML,
NewsML, SGML, XML
GML
AACR2, AAT, AGLS, APPM,
BISAC, CanCore, CCO, CDWA, CDWA Lite, CIDOC/CRM,
DACS, DC, DCAM, DDC, DIF, DIG35, DwC, EAC-CPF, EAD, EML,
FGDC/CSDGM, FOAF, FRAD, FRBR, FRSAD, GEM, GILS, GML, ID3,
IEEE/LOM, indecs, ISAAR(CPF), ISAD(G), ISBD, ISO 19115, LCC,
LCSH, Linked Data, MADS, MARC, MARC Relator Codes,
MARCXML, MESH, MO, MODS, MPEG-7, MuseumDat,
NewsML, OAI-PMH, ONIX, Ontology for Media
Resource, PB Core, PRISM, QDC, RAD, RDA,
SCORM, Sears List of Subject Headings,
SKOS, SPECTRUM, SRU, SWAP, TGM I,
TGM II, TGN, Topic Maps, ULAN,
VRA Core, XOBIS,
Z39.50
Atom, OpenURL, RDF, RSS, SGML,
VSO Data Model, XML, XMP
MEI, MusicXML,
OAI-ORE, TEIMPEG-21 DIDL, MXF
SGML, XML
ADL, TEI, XMP
OAIS
AES
Process
History,
OAIS,
PREMIS
indecs, METS
Rights, ODRL, XrML
ADL, OAI-ORE
AES Core Audio, MIX,
TextMD
CIDOC/CRM, FRAD, FRBR, FRSAD,
indecs, OAIS, VSO Data Model
AACR2, APPM,
CCO, DACS, RAD, RDA
AAT, BISAC, DDC,
LCC, LCSH, MARC
Relator Codes, MESH,
Sears List of Subject
Headings, TGM I, TGM II,
TGN, ULAN
DCAM,
DTD, Linked Data,
OAI-ORE, OAI-PMH,
OpenURL, RDF,
RELAX NG, SGML,
SRU, Topic
Maps, XML,
XML Schema,
XPath,
XQuery,
XSLT,
Z39.50
EAD, EML,
GML,
MathML,
MEI,
Music-
XML,
TEI
ADL, AES Core Audio, AES Process
History, CDWA Lite, DIF, DIG35, DwC,
EAC-CPF, EAD, EML, FGDC/CSDGM,
GEM, GILS, GML, ID3, IEEE/LOM, ISO
19115, KML, MADS, MARC, MARCXML,
MathML, MEI, METS, METS Rights, MIX,
MODS, MPEG-21 DIDL, MuseumDat,
MusicXML, MXF, ODRL, ONIX, PB
Core, PREMIS, PRISM,
SCORM, SMIL, TextMD, VRA
Core, XrML
ADL, AES Core Audio,
AES Process History, AGLS,
CDWA, CDWA Lite, DC, DIF,
DIG35, DwC, EAC-CPF, EML,
FGDC/CSDGM, FOAF, GEM, GILS, GML,
ID3, IEEE/LOM, ISO 19115, KML, MADS,
MARC, MARCXML, MathML, METS, METS Rights,
MIX, MO, MODS, MPEG-21 DIDL, MPEG-7,
MuseumDat, MXF, NewsML, ODRL, ONIX, Ontology for
Media Resource, PB Core, PREMIS, PRISM, QDC,
SCORM, SPECTRUM, SWAP, TextMD, VRA
Core, XMP, XOBIS, XrML
AES Core Audio,
MIX, SGML, TextMD,
XML
DIG35, ID3, PB Core,
PRISM, RDF, SGML,
SPECTRUM, XML
MEI, METS,
MusicXML, MXF,
RDF, SGML, XML
ADL, AES Process
History, DIG35, ID3,
MPEG-7, MXF, RDF,
SGML, VSO Data Model,
XML
ISAAR(CPF), ISBD, RDF
PRISM
Atom, RSS,
SKOS
KML,
NewsML
AGLS, DC, MPEG-7, QDC, TEI, XMP
CanCore, EAD,
ISAD(G), SKOS
CIDOC/CRM,
OAI-ORE,
RDA, Topic
Maps
OAI-PMH, SRU
SGML, XML
GEM, METS
Rights, QDC,
VRA Core
AGLS,
CanCore
IEEE/LOM,
ISAD(G),
NewsML,
SKOS
FGDC/CSDGM, ISO 19115, MPEG-7,
OAI-PMH, SCORM, XMP
EAD,
FGDC/CSDGM, ISO
19115, MPEG-7, TEI,
Topic Maps, XMP
DIF,
FGDC/CSDGM,
GML, ISO
19115, PB Core,
SCORM, XMP
AACR2, AGLS, APPM, Atom,
CanCore, CDWA, CDWA Lite,
CIDOC/CRM, DACS, DC, DIF, EAD, EML, IEEE/LOM,
ISAD(G), ISBD, MARC, MARCXML, MO, MODS,
MPEG-21 DIDL, MuseumDat, NewsML, ONIX, QDC,
RAD, RDA, RSS, SWAP, TEI, VRA Core
AACR2, AES Core Audio,
CanCore, CDWA, CDWA
Lite, CIDOC/CRM, DC, FRBR,
ID3, IEEE/LOM, Linked Data,
MARC, MARCXML, MO,
MODS, MPEG-21 DIDL,
ONIX, PB Core,
QDC, RDA,
TextMD
AACR2, AGLS,
CanCore, DC,
EML, FRBR,
IEEE/LOM,
MARC,
MARCXML,
MODS,
MPEG-21 DIDL,
Ontology for
Media Resource,
PREMIS, QDC,
RDA, VRA Core
Atom, ISAAR(CPF),
ISBD, RSS, VSO Data
Model
Atom, FOAF, OAI-ORE, RSS
AACR2, AES Core
Audio, CCO, DC,
EAC-CPF, EML,
IEEE/LOM, MIX, MODS,
NewsML, ODRL, ONIX, PB
Core, RAD, RDA,
TextMD, XrML
CDWA, DC, GILS,
ISAD(G), ISBD, MARC,
MARCXML, MODS,
QDC, TEI, VRA Core
CDWA, MPEG-21
DIDL, VRA Core
Conceptual Model
Content Standard
Controlled
Vocabulary
Framework/
Technology
Record
Format
Structure
Standard
Technical Metadata
Structural
Metadata
Rights
Metadata
Preservation
Metadata
Metadata
Wrappers
Descriptive
Metadata
Data
Community refers to the groups that currently
or potentially use the standard. Those that originated a
standard or who are the primary audiences are stronger
matches, while those that could use the standard effectively but
do not frequently do so are weaker matches.
Libraries refers to those organizations that collect and preserve
both primary and secondary material in support of research,
scholarship, teaching, and leisure. Academic, public, special, and
corporate libraries are included here.
Archives refers to those organizations that collect and preserve the natural
outputs of the daily work of individuals and other organizational entities,
including traditional records management processes. Their emphasis is frequently
on the context of the creation of the materials and their relationship to one another.
Museums refers to those organizations that collect and preserve artifacts from a
given field with an emphasis on their curation and interpretation. Art, science, natural
history, and many other types of museums are included here.
Information Industry refers to the diverse organizations that make up both the public
and the commercial Web. Technologies that support inventory and knowledge management,
e-commerce, and the workings of the Internet are included here.
Purpose refers to the general type of
metadata the standard is designed to record.
Typically a standard will be strongly focused on
one purpose but include a few data elements for
other purposes considered especially important.
Data here refers to standards whose purpose is to
enclose the resource itself, possibly together with
metadata or with added value such as markup.
Descriptive Metadata standards include
information to facilitate the discovery (via search or
browse) of resources, or provide contextual information
useful in the understanding or interpretation of a resource.
Metadata Wrappers package together metadata of
different forms, or metadata together with the resource itself.
Preservation Metadata is broadly the information
needed to preserve, keep readable, and keep useful a digital or
physical resource over time. Technical metadata is one type of
preservation metadata, but preservation metadata also includes
information about actions taken on a resource over time and the
actors who take these actions.
Rights Metadata is the information a human or machine needs
to provide appropriate access to a resource, provide appropriate
notification and compensation to rights holders, and to inform end
users of any use restrictions that may exist.
Structural Metadata makes connections between different
versions of the same resource, makes connections between hierarchical
parts of a resource, records necessary sequences of resources, and flags
important points within a resource.
Technical Metadata documents the digital and physical features of a
resource necessary to use it and understand when it is necessary to migrate it
to a new format.
Function refers to the role a standard plays in the creation and storage of
metadata. Some functions define the basic entities to be described, others define
specific fields, others give guidance on how to record a specific data element, and
still others define concrete data structures for the storage of information.
Conceptual Models provide a high-level approach to resource description
in a certain domain. They typically define the entities of description and their
relationship to one another. Metadata structure standards typically use
terminology found in conceptual model in their domain.
Content Standards provide specific guidance on the creation of data
for certain fields or metadata elements, sometimes defining what the source
of a given data element should be. They may or may not be designed for
use with a specific metadata structure standard.
Controlled Vocabularies are enumerated (either fully or by
stated patterns) lists of allowable values for elements for a specific use or
domain. Classification schemes that use codes for values are included
here.
Framework/Technology here is a general term encompassing
models and protocols for the encoding and/or transmission of
information, regardless of its specific format.
Markup Languages are formats that allow the featuring of
specific aspects of a resource, typically in XML. They are unlike
other "metadata" formats in that they provide not a surrogate for
or other representation of a resource, but rather an enhanced
version of the full resource itself.
Record Formats are specific encodings for a set of data
elements. Many structure standards are defined together with
a record format that implements them.
Structure Standards are those that define at a
conceptual level the data elements applicable for a certain
purpose or for a certain type of material. These may be
defined anew or borrowed from other standards. This
category includes formal data dictionaries. Structure
standards do not necessarily define specific record formats.
Community
AATArchives
Libr
ar
ies
Museums
Controlled Vocabulary
Descrip
tiv
e
M
etadata
Cultural Objects
VisualResources
CCO
Lib
ra
ries
Museums
Cultural Objects
VisualResources
ContentStandard
Con
tro
lle
d
Vo
ca
bu
lary
Descrip
tiv
e
M
etadata
CDWA LiteRigh
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
Record
Form
at
Structure Standard
Cultural Objects
VisualResources
Libra
ries
M
useum
s
Archives
AACR2
Libra
ries
M
us
eu
m
s
Archives
Moving Im
ages
Musical Mate
ria
ls
Schola
rly
Texts
Vis
ualResources
Technical Metadata
Righ
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
ContentStandard
Con
tro
lle
d
Vo
ca
bu
lary
DACS
Libra
ries
M
us
eu
m
s
Archives
Mov
ing Im
ag
es
M
usica
l M
at
er
ials
Schola
rly
Texts
Vis
ualReso
urc
es
Cultural Objects
ContentStandard
Descriptive Meta
data
Righ
ts
Metad
ata
DublinCore Technical Metadata
Righ
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
C
on
te
nt
Sta
nd
ar
d
Con
tro
lle
d
Vo
ca
bu
lary
Record
Form
at
Structure Standard
M
ovin
g
Im
ages
Music
alMateria
ls
Scholarly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Arc
hives
Info
rm
ation
Industry
LibrariesMuseums
EAD
Libra
ries
M
us
eu
m
s
Archives
Moving Im
ag
es
M
usica
l M
at
er
ials
Schola
rly
Texts
Vis
ualReso
urc
es
Cultural Objects
Record
Form
at
Structure Standard
M
arkup
Language
Righ
ts
Metad
ata
Struc
tural Me
tad
ata
Descrip
tiv
e
M
etadata
FOAF
Descrip
tiv
e
M
etadata
Record
Format
Structure Standard
Archives
Inform
ation Industry
LibrariesMuseums
FRBRArchives
Inf
ormation
Ind
us
try
LibrariesMuseums
M
ovin
g
Im
ages
Music
alMateria
ls
Schola
rly
Texts
Visual
Resou
rce
s
Cultural Objects
Geosp
atia
l Dat
a
Datasets
ConceptualM
odel
Technical Metadata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
LCSH
Descrip
tiv
e
M
etadata
Controlled Vocabulary
M
ovin
g
Im
ages
Music
alMateria
ls
Schola
rly
TextsVisual
Resou
rce
s
Cultural Objects
Geosp
atia
l Dat
a
Archives
Inf
ormation
Ind
us
try
LibrariesMu
seum
s
MADS
Libra
ries
M
us
eu
m
s
Archives
M
ovin
g
Im
ages
M
usic
alM
ate
rials
Schola
rly
Texts
VisualResources
Cultural Objects
Datasets
Geosp
atia
l Dat
a
Record
Form
at
Structure Standard
Descrip
tiv
e
M
etadata
MARCTechnical Metadata
Righ
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
C
on
te
nt
Sta
nd
ar
d
Record
Form
at
Structure Standard
M
ovin
g
Im
ages
M
usic
alM
ate
rials
Schola
rly
Texts
Vis
ualResourc
es
Cultural Objects
Geosp
atia
l Dat
a
Datasets
Libra
ries
Archives
MARCXMLTechnical Metadata
Righ
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
C
on
te
nt
Sta
nd
ar
d
Record
Form
at
Structure Standard
M
ovin
g
Im
ages
M
usic
alM
ate
rials
Vis
ualResourc
es
Cultural Objects
Geosp
atia
l Dat
a
Datasets
Libra
ries
Archives
Schola
rly
Texts
METS
Archives
Inf
ormation
Ind
us
try
Mu
seum
s
Libraries
M
ovin
g
Im
ages
Music
alMateria
ls
Scholarly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Record
Form
at
Structure Standard
Structural Metadata
M
etadata
W
rappers
MIXArchives
Inf
ormation
Ind
us
try
Museums
Libraries
M
us
ical
M
at
er
ials
Sc
ho
larly
Te
xts
VisualResources
Cultural Objects
Con
tro
lle
d
Vo
ca
bu
lary
Record
Form
at
Structure Standard
Technical MetadataPreservation Metadata
MODS
Archives
Museums
Libra
ries
M
ovin
g
Im
ages
M
usic
alM
ate
rials
Schola
rly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
C
on
te
nt
Sta
nd
ar
d
Con
tro
lle
d
Vo
ca
bu
lary
Record
Form
at
Structure Standard
Technical Metadata
Righ
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
OAISM
et
ad
at
a
W
ra
pp
er
s
Dat
a
Preservation Metadata
ConceptualM
odel
M
ovin
g
Im
ages
M
usic
alM
ate
rials
Scholarly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Archives
Info
rm
ation
Industry
LibrariesMuseums
OAI-PMHRight
s Met
ad
at
a
Descrip
tiv
e
M
etadata
Rec
ord Fo
rm
at
Fram
ew
ork/Technolo
gy
M
ovin
g
Im
ages
M
usic
alM
ate
rials
Scholarly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Archives
Inform
ation Indu
stry
LibrariesMuseums
OAI-OREStructural Metadata
D
escriptive
M
eta
data
Re
co
rd
Fo
rm
at
Structure
Standard
Fram
ew
ork/Technolo
gy
M
ovin
g
Im
ages
M
usicalM
ate
rials
Scholarly
Texts
Vis
ualResourc
es
Cultural Objects
Datasets
Geospatial Data
Arc
hives
Info
rm
ation
Industry
LibrariesMuseums
ONIX
Inform
ation Industry
Libraries
Scholarly
Texts
Con
tro
lle
d
Vo
ca
bu
lary
Record
Form
at
Structure Standard
Righ
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
QDCArc
hives
Info
rm
ation
Industry
LibrariesMuseums
M
ovin
g
Im
ages
Music
alMateria
ls
ScholarlyTexts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Con
te
nt
Sta
nd
ar
d
Controlle
d
Vocabula
ry
Record
Form
at
Structure Standard
Righ
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
Technical Metadata
PREMIS
Archives
Inf
ormation
Ind
us
try
LibrariesMuseums
M
ovin
g
Im
ages
M
usic
alM
ate
rials
Schola
rly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Record
Format
Structure Standard
Technical Metadata
Preservation Metadata
XSLT
Fram
ew
ork/Technolo
gy
M
ovin
g
Im
ages
Music
alMateria
ls
ScholarlyTexts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Archives
Inform
ation Industry
LibrariesMuseums
XML
Rig
hts
M
eta
data
Structural Metadata
Descriptive
M
eta
data
Technical Metadata
M
eta
data
W
ra
ppers
Data
Preservation Metadata
Fram
ew
ork/Technolo
gy
M
arkup
Language
M
ovin
g
Im
ages
Music
alMateria
ls
Scholarly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Archives
In
fo
rmation
In
dustry
LibrariesMuseums
VRA CoreRigh
ts
Metad
ata
Descrip
tiv
e
M
etadata
Technical Metadata
Con
te
nt
Sta
nd
ar
d
Controlle
d
Vocabula
ry
Record
Form
at
Structure Standard
Co
nc
ep
tual
Mod
el
VisualResources
Cultural Objects
Arc
hives
Libraries
Museums
TGN
Descrip
tiv
e
M
etadata
Controlled
Vocabula
ry
M
ovin
g
Im
ages
Mu
sic
al
Ma
ter
ials
ScholarlyTexts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Archives
LibrariesMuseums
TEIRights Metad
ata
Struc
tural Me
tad
ata
D
escriptive
M
eta
data
Technical Metadata
M
arkup
Language
Con
te
nt
Sta
nd
ar
d
Record
Form
at
Scholarly
Texts
Arc
hives
In
fo
rm
ation
In
dust
ry
LibrariesMu
seum
s
SKOS
D
escriptive
M
eta
data
Conc
ep
tu
al
Model
Fra
m
ew
ork
/T
echnolo
gy
Structure Standard
M
ovin
g
Im
ages
MusicalMaterials
Scholarly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Arc
hives
In
fo
rmation
In
dustry
LibrariesMuseums
RDARigh
ts
Metad
ata
Struct
ura
l Metad
ata
Descrip
tiv
e
M
etadata
Technical Metadata
Con
tro
lle
d
Vo
ca
bu
lary
Structure
Standard
Conceptu
alModel
M
ovin
g
Im
ages
MusicalMaterials
VisualResources
Cultural Objects
Archive
s
LibrariesMu
seum
s
RDFRig
hts
M
eta
data
Structural Metadata
D
escriptive
M
eta
data
Technical Metadata
Conceptu
alM
odel
Fram
ew
ork/Technolo
gy
M
ovin
g
Im
ages
MusicalMaterials
Scholarly
Texts
VisualResources
Cultural Objects
Datasets
Geospatial Data
Arc
hives
In
fo
rmation
In
dustry
LibrariesMuseums
A Visualization of the
Metadata Universe
Markup
Language
Weak
Strong
Strong
Strong
Sem
i-Strong
Semi-Strong
Sem
i-Strong
Sem
i-W
eak
W
ea
k
Sem
i-Weak
We
ak
Semi-W
eak
We
ak
Content: Jenn Riley
Design: Devin Becker
Work funded by the Indiana University Libraries’
White Professional Development Award
Copyright 2009-2010 Jenn Riley
This work is licensed under a Creative Commons
Attribution-Noncommercial-Share Alike 3.0 United States License
<http://creativecommons.org/licenses/by-nc-sa/3.0/us/>.
The sheer number of metadata standards in the cultural
heritage sector is overwhelming, and their inter-relationships
further complicate the situation. This visual map of the
metadata landscape is intended to assist planners with the
selection and implementation of metadata standards.
Each of the 105 standards listed here is evaluated on its
strength of application to defined categories in each of four
axes: community, domain, function, and purpose. The strength
of a standard in a given category is determined by a mixture of
its adoption in that category, its design intent, and its overall
appropriateness for use in that category.
The standards represented here are among those most heavily
used or publicized in the cultural heritage community, though
certainly not all standards that might be relevant are included.
A small subset of the standards plotted on the main
visualization also appear as highlights above the graphic. These
represent the most commonly known or discussed standards for
cultural heritage metadata.
StrongConnection
Semi-StrongConnection
Semi-WeakConnection
Wea
kCon
nec
tion
The standards listed
closest to the center
of a sliver are those
that are most strongly
connected to the given
category.
Sliver
=
Category
Strength of
Standard’s connection
indicated by
Font Size
&
Color
Saturation
Summary and Purpose
LEGEND TEIRigh
ts
Me
tad
ata
Stru
ctur
al Met
ada
ta
De
sc
rip
tiv
e
M
et
ad
at
a
Technical Metadata
M
ark
up
La
nguage
Co
nte
nt
Sta
nd
ard
Re
co
rd
Fo
rm
at
ScholarlyTex
ts
Arch
ives
Inf
or
ma
tio
n
Ind
us
try
LibrariesMus
eum
s
Font Size
=
Star’s strength for
given category
Stars represent those
standards that are used
most often.
Strong connection
Semi-Strong connection
OAIS
AGLS, APPM, DACS, EAC-CPF, EAD, GILS,
ISAAR(CPF), ISAD(G), RAD
ADL, AES
Core Audio,
AES Process
History, Atom, BISAC,
DIF, DIG35, DTD, FOAF,
ID3, KML, Linked Data,
MathML, MO, MPEG-21 DIDL,
MPEG-7, MusicXML, MXF, NewsML,
OAIS, ODRL, ONIX, Ontology for Media
Resource, PRISM, RDF, RELAX NG, RSS,
SCORM, SKOS, SMIL, Topic Maps,
XML, XML Schema, XMP,
XPath, XQuery, XrML,
XSLT
AACR2, AGLS,
CQL, DDC, FRAD,
FRBR, FRSAD, GILS,
ISBD, LCC, LCSH,
MADS, MARC, MARC
Relator Codes, MARCXML,
MESH, METS, MIX, MODS,
OAI-PMH, OAIS, OpenURL,
PREMIS, RDA, Sears List of
Subject Headings, SRU, SWAP, TEI,
TextMD, TGM I, TGM II, VRA Core,
XML, XML Schema, XOBIS, XPath,
XSLT, Z39.50
AAT, CCO, CDWA, CDWA Lite, CIDOC/CRM,
MuseumDat, SPECTRUM, TGN, ULAN` DTD, OAI-PMH, VRA
Core, XML,
XMLSchema, XPath,
XQuery, XSLT
AES Core Audio, AES Process History, CanCore,
CCO, DC, DCAM, DTD, FGDC/CSDGM, GEM,
IEEE/LOM, MEI, METS Rights, OAI-ORE, PB
Core, QDC, RDF, SGML, TGN, XQuery
DC, DCAM, EML,
FGDC/CSDGM, GEM, GML,
IEEE/LOM, indecs, ISO 19115,
OAI-ORE, QDC, SGML, VSO Data
Model
GILS, MEI, MESH,
OAI-PMH, SWAP, TEI
e, CQL, DwC,
METS, MIX, APPM, Atom, CDWA, CDWA Lite, CIDOC/CRM, DACS, DwC, EAC-CPF,
EAD, EML, FOAF, indecs, ISAAR(CPF), ISO 19115, Linked Data,
MPEG-21 DIDL, ONIX, RELAX NG, RSS, SKOS, Topic Maps, ULAN
AAT, ADL, DIF, ID3, ISAD(G), KML, MPEG-7, MusicXML, MXF, ODRL, RAD, SMIL, VSO Data Model, XMP, XRML
AACR2, AES Core
Audio, AES Process
History, APPM,
CanCore, DACS,
DDC, DwC,
EAC-CPF, EAD,
FGDC/CSDGM,
FRBR, GEM,
IEEE/LOM,
ISAAR(CPF), ISAD(G),
ISO 19115, KML, LCC,
LCSH, MADS, MARC
Relator Codes, MESH,
METS, METS Rights,
MPEG-7, ODRL, PB
Core, RAD, RDA,
RELAX NG, SMIL, SRU,
TEI, TextMD, XMP,
XOBIS, XrML, Z39.50
Atom, DC, DCAM,
FOAF, indecs,
Linked Data, MIX,
MODS, OAI-ORE,
OAIS, PREMIS,
QDC, RDF, RSS,
SGML, SKOS,
TGM I,
TGM II, Topic
Maps
Information
Industry
Libraries
Museums
Visual
Resources
spatial
Data
Moving
Images Musical
Materials
Scholarly
Texts
AAT, CCO,
CDWA, CDWA Lite,
DOC/CRM, DC, DTD, METS,
X, MPEG-21 DIDL, MuseumDat, OAI-PMH,
tology for Media Resource, QDC, SPECTRUM, TGN,
N, VRA Core, XML, XML Schema, XPath, XSLT
Strong
Strong
Semi-Strong
Semi-Weak
Strong
Sem
i-Strong
Semi-Weak
Weak
DC,
DIF, DTD,
EML, METS,
MPEG-21 DIDL, OAIS,
QDC, VSO Data Model, XML,
XML Schema, XPath,
XSLT
ed
Rights,
AI-PMH,
IS, RDF,
GML, SKOS,
XrML
DC, DTD,
FGDC/CSDGM,
GML, ISO
19115, KML,
OAIS, QDC,
TGN, XML, XML
Schema, XPath,
XSLT
AGLS, DCAM, EML, Linked Data,
METS, METS Rights, MPEG-21
DIDL, OAI-PMH, ODRL, PREMIS,
RDF, RELAX NG, SGML, SKOS,
SRU, XQuery, XrML
CanCore, DDC,
EAC-CPF, FRBR, GEM,
IEEE/LOM, ISAAR(CPF), ISBD,
CC, LCSH, MADS, MARC, MARC Relator
ARCXML, Ontology for Media Resource,
of Subject Headings, XMP,
DC, DTD, FRBR, LCSH,
METS, MPEG-21 DIDL,
MXF, Ontology for
Media Resource, PB
Core, QDC, XML,
XML Schema,
XPath, XSLT,
Z39.50
AACR2, CanCore, DCAM, DDC, GEM, IEEE/LOM,
indecs, ISBD, LCC, Linked Data, MADS, MARC,
MARC Relator Codes, MARCXML, METS
Rights, MODS, MPEG-7, MuseumDat,
NewsML, OAI-PMH, OAIS, ODRL, PREMIS,
RAD, RDA, RDF, RELAX NG, Sears List of
Subject Headings, SGML, SKOS, SMIL,
SRU, XMP, XOBIS, XQuery, XrML
AGLS, APPM, Atom, CIDOC/CRM,
DACS, EAC-CPF, EAD,
ISAAR(CPF), ISAD(G), OAI-ORE,
RSS, SCORM, TGN, Topic Maps
ADL, AES Core Audio,
AES Process History,
DC, DTD, FRBR, ID3,
LCSH, MEI, METS, MO,
MPEG-21 DIDL,
MusicXML, MXF,
Ontology for Media
Resource, PB Core,
QDC, XML, XML
Schema, XPath,
XSLT, Z39.50
AACR2, DCAM, DDC,
indecs, ISBD, LCC, Linked
Data, MADS, MARC, MARC
Relator Codes, MARCXML, METS
Rights, MODS, OAI-PMH, OAIS,
ODRL, PREMIS, RAD, RDA, RDF,
RELAX NG, Sears List of Subject
Headings, SGML, SKOS, SMIL, SRU,
XOBIS, XQuery, XrML
AGLS, APPM, Atom,
CIDOC/CRM, DACS, EAC-CPF, EAD,
ISAAR(CPF), ISAD(G), MPEG-7, OAI-ORE,
RSS, SCORM, Topic Maps
CanCore, GEM, IEEE/LOM, MIX,
MuseumDat, TGN, XMP
DC, DTD,
ISBD, LCSH, MESH,
METS, MPEG-21 DIDL,
OAI-ORE, OAI-PMH,
OAIS, ONIX, OpenURL,
QDC, SRU, SWAP, TEI,
TextMD, XML, XML
Schema, XPath,
XSLT, Z39.50
AACR2,
AGLS, Atom,
BISAC, DACS, DCAM,
DDC, FRBR, indecs, LCC,
Linked Data, MADS, MARC, MARC
Relator Codes, METS Rights, MODS,
PREMIS, PRISM, RDF, RELAX NG,
RSS, Sears List of Subject
Headings, SGML, SKOS, XMP,
XOBIS, XQuery, XrML
CanCore,
EAC-CPF, EAD, GEM,
IEEE/LOM, ISAAR(CPF),
ISAD(G), MARCXML, ODRL,
Ontology for Media
Resource, SCORM, TGN,
Topic Maps
MathML, MIX
AAT, CCO,
CDWA, CDWA Lite,
DC, DIG35, DTD, METS,
MIX, MPEG-21 DIDL, OAI-PMH,
OAIS, Ontology for Media Resource, PB
Core, QDC, SRU, TGM I, TGM II, TGN, ULAN,
VRA Core, XML, XML Schema, XPath, XSLT, Z39.50
AACR2, CanCore,
CIDOC/CRM, DCAM, GEM,
IEEE/LOM, indecs, ISBD, Linked Data,
MADS, MARC Relator Codes, METS
Rights, MODS, MPEG-7, MuseumDat,
NewsML, ODRL, PREMIS, RAD,
RDA, RDF, RELAX NG, SGML,
SKOS, SMIL, XMP, XOBIS,
XQuery, XrML
AGLS, APPM,
Atom, DACS,
EAC-CPF, EAD,
ISAAR(CPF),
ISAD(G), LCSH,
MARC,
MARCXML,
OAI-ORE, RSS,
SCORM, Sears
List of Subject
Headings, Topic
Maps
DDC, FRBR,
LCC
Atom, DwC, GILS,
indecs, MODS, OAI-ORE,
RSS, SCORM, Topic Maps,
Z39.50
Seeing Standard
ata
iptive
M
etadata
Information
Lib
M
tive
M
etadata
Informati
Descriptive
M
etadata
Arc
Information Industr
LibrarieMuseum
MARCTechnical Metadata
Rights
Metadata
Structural Metadata
Descriptive
M
etadata
Cultural Objects
Libraries
Archives
METS
Archives
Information Industry
Museums
Libraries
ving
Im
ages
lMaterials
olarlyTexts
Resources
Cultural Objects
Datasets
Geospatial Data
Record Format
Structure Standard
Structural Metadata
M
etadata
W
rappers
MODS
Archives
Museums
Libraries
ng
Im
ages
M
aterials
arly
Textssources
Cultural Objects
Datasets
Geospatial Data
C
ontentStanda
Controlled
Vocabular
Record Format
Structure Standard
Technical Metadata
Rights
Metadata
Structural Metadata
Descriptive
M
etadata
OAI-PMH
Descript
Fra
m
ageserialsTextsurces
Cultural Objects
Datasets
spatial Data
Archives
Information
Industry
LibrariesMuseums
A Visualization of th
Metadata Univers
Weak
Content: Jenn Riley
Design: Devin Becker
Work funded by the Indiana University Libraries’
White Professional Development Award
Copyright 2009-2010 Jenn Riley
This work is licensed under a Creative Commons
Attribution-Noncommercial-Share Alike 3.0 United States License
<http://creativecommons.org/licenses/by-nc-sa/3.0/us/>.
The sheer number of metadata standards in the cultural
heritage sector is overwhelming, and their inter-relationships
further complicate the situation. This visual map of the
metadata landscape is intended to assist planners with the
selection and implementation of metadata standards.
Each of the 105 standards listed here is evaluated on its
strength of application to defined categories in each of four
axes: community, domain, function, and purpose. The strength
of a standard in a given category is determined by a mixture of
its adoption in that category, its design intent, and its overall
appropriateness for use in that category.
The standards represented here are among those most heavily
used or publicized in the cultural heritage community, though
certainly not all standards that might be relevant are included.
A small subset of the standards plotted on the main
visualization also appear as highlights above the graphic. These
represent the most commonly known or discussed standards for
cultural heritage metadata.
StrongConnection
Semi-StrongConnection
Semi-WeakConnection
WeakConnection
T
clo
of a
that a
conne
categor
Strength of
Standard’s connection
indicated by
Font Size
&
Color
Saturation
Summary and Purpose
LEGEND TEIRights
Metadata
Structural Metadata
D
escrip
tiv
e
M
etadata
Technical Metadata
M
arkup
Language
Conte
ntSta
ndard
Record Format
ScholarlyTexts
Archives
In
fo
rm
ation
In
dustry
LibrariesMuseums
Font Size
=
Star’s streng
given categ
Stars represent those
standards that are used
most often.
Strong connection
Semi-Strong connection
33. Les images, les « oubliées » de l’effort d’ouverture
d’interopérabilité ?
Le but de IIIF est de créer un cadre
technique commun grâce auquel les
bibliothèques numériques peuvent délivrer
leurs contenus de manière standardisée
sur le Web afin de les rendre consultables,
manipulables et annotables par n’importe
quelle application ou logiciel
compatible.
Régis Robineau, https://insula.univ-lille3.fr/2016/11/comprendre-
iiif-interoperabilite-bibliotheques-numeriques/
1 communauté
ensemble de
spécifications
techniques
34. IIIF en quelques mots
Ce cadre technique est évolutif se compose:
• d’un modèle de données : Shared Canvas (http://iiif.io/model/shared-canvas/1.0/)
• de 4 APIs fonctionnant de manière conjointe et complémentaire :
• API Image 2.1 : http://iiif.io/api/image (bêta 3 en cours)
• API Presentation 2.1 : http://iiif.io/api/presentation (bêta 3 en cours)
• API Search 1.0 : http://iiif.io/api/search
• API Authentification 1.0 : http://iiif.io/api/auth/1.0/
ensemble de
spécifications
techniques
35. Mirador est un visualiseur qui permet d'afficher dans une interface commune des documents
provenant de bibliothèques numériques compatibles avec les standards IIIF
https://chercher-archives.lamayenne.fr
36. Qatar Digital Library - https://www.qdl.qa/en
bibliothèque numérique de l’INHA : https://bibliotheque-numerique.inha.fr/
37. La numérisation des vignettes d’un côté, du manuscrit de l’autre, permet un repositionnement virtuel,
facilité par certaines avancées technologiques en matière de visualisation et d'interopérabilité des images.
http://demos.biblissima-condorcet.fr/chateauroux/
38. La numérisation des vignettes d’un côté, du manuscrit de l’autre, permet un repositionnement virtuel,
facilité par certaines avancées technologiques en matière de visualisation et d'interopérabilité des images.
http://demos.biblissima-condorcet.fr/chateauroux/
39. https://goo.gl/LHvA2u
La Bible des poètes, édition publiée par Antoine de Vérard à Paris en 1493. 11 exemplaires
conservées.
Comparaison de 2 cycles iconographiques (Vélin 559 et Vélin 560)
45. Données accessibles sur le web (sans condition de formats)
Données accessibles structurées (ex: fichier Excel plutôt
que le PDF d’un tableur)
Données structurées dans des formats non-propriétaires (ex:
CSV plutôt qu’Excel)
Utilisation des URIs pour identifier les ressources
Les données sont reliées à d’autres données
Open Data
Linked Open Data
Tim Berners-Lee, un des fondateurs
du Web et initiateur du Linked data,
a suggéré un développement en 5
étoiles pour les Open Data. Chaque
étape est ici caractérisée, avec ses
coûts et ses profits.
http://5stardata.info/en/
46. 3 dimensions d’analyse :
• le format / qualité / résolution
• accès / manipulation / interconnexion
• licence / réutilisation
image base définition
réutilisation non commerciale
pas d’accès pérenne
licence ouverte
réutilisation non commerciale
pas d’accès pérenne
image HD
image HD
image HD
réutilisation non commerciale
pas d’accès pérenne
image HD
licence ouverte
51. Se soustraire aux métadonnées ?
ImagePlot : https://www.flickr.com/photos/culturevis/4181967739/in/set-72157622525012841
https://skylab.inha.fr/retif_images/
52. Computational art history
• création de corpus d’entraînement
• Génération du sujet par études statistiques/
probabilistes via les métadonnées des oeuvres du
catalogue raisonné
• Deeplearning pour de la recherche par similarité
visuel
SMARTIFY: Scan & Discover art : le fantasme du shazam de l’histoire de l’art
projet Replica, EPFL, Lausanne : https://dhlab.epfl.ch/page-128334-en.html
53. Computational art history
• création de corpus d’entraînement
• Génération du sujet par études statistiques/
probabilistes via les métadonnées des oeuvres du
catalogue raisonné
• Deeplearning pour de la recherche par similarité
visuel
SMARTIFY: Scan & Discover art : le fantasme du shazam de l’histoire de l’art
projet Replica, EPFL, Lausanne : https://dhlab.epfl.ch/page-128334-en.html
54. Computational art history
• création de corpus d’entraînement
• Génération du sujet par études statistiques/
probabilistes via les métadonnées des oeuvres du
catalogue raisonné
• Deeplearning pour de la recherche par similarité
visuel
SMARTIFY: Scan & Discover art : le fantasme du shazam de l’histoire de l’art
projet Replica, EPFL, Lausanne : https://dhlab.epfl.ch/page-128334-en.html
55. Le deeplearning permet des reconnaissances automatiques d’images. Les projets du DHLab
de l’EPFL de Lausanne dans le cadre de Time Machine développent différents programmes
notamment REPLICA, qui analyse les reproductions photographiques de la collection Cini.
Pour en savoir plus, voir la conférence de Benoit Seguin à l’INHA : https://www.youtube.com/watch?v=JxFMEAokjTM
Moteur de recherche REPLICA intégré à l’interface Diamond : https://diamond.timemachine.eu/
analyse d’images
https://diamond.timemachine.eu/
59. Faire émerger des corpus visuels
• GallicaPix
• outil de recherche iconographique dans nos collections d'imprimés numérisés (livre, revue, presse) de la période 14-18
• croisement des méthodes : s’appuie sur les fichiers d’OCR et OLR (Optical Layout Recognition), métadonnées bibliographiques et
méthode d’apprentissage pour la typologie et l’indexation visuelle.
Exemples de résultats pour une requête « clemenceau,http://bit.ly/33IdUw1
62. Projet - Segmentation
Bibliothèque de l’Institut national d’histoire de l’art, collections Jacques Doucet
Segmentation du catalogue NUM CV03437_19160711
https://bibliotheque-numerique.inha.fr/collection/item/25924-vente-par-autorite-de-justice-de-11-tableaux-des-ecoles-francaise-et-i
alienne-vente-du-11-juillet-1916
63. Projet - Segmentation
Bibliothèque de l’Institut national d’histoire de l’art, collections Jacques Doucet
Segmentation du catalogue NUM CV03437_19160711
https://bibliotheque-numerique.inha.fr/collection/item/25924-vente-par-autorite-de-justice-de-11-tableaux-des-ecoles-francaise-et-i
alienne-vente-du-11-juillet-1916
64.
65. 49ème congrès de l’ABDU - 17/19 septembre 2019
Tous Bibl-IA-thécaires ? L’intelligence artificielle vers un
nouveau service public ?
https://adbu.fr/retour-sur-la-matinee-politique-du-congres-adbu2019-
les-bibliotheques-universitaires-et-le-developpement-de-la-science-
ouverte-realites-espoirs-et-enjeux/
The DHAI Seminar
When Digital Humanities Meet Artificial Intelligence
Prochaine séance, le 22 octobre :
The Ontology of Sight in the Age of AI: The Machine Learned Image in Art, Architecture, and Historic Preservation
https://dhai-seminar.github.io
Journée annuelle de l’ADEMEC, Paris le 11 décembre 2019
IA et institutions patrimoniales : enjeux, défis et opportunités
https://www.eventbrite.fr/e/billets-ia-et-institutions-patrimoniales-enjeux-defis-et-opportunites-76425652183
meetup API(dot)Culture : Image et IA
(Bnf, 4 juillet 2019)
Les présentations sont sur slideshare
https://www.slideshare.net/IsabelleReusa/
apidotculture-images-et-ia
67. Logo.Hallucination par l’artiste Christophe Bruno
The software based on neural network image recognition was exhibited at the
Rencontres Internationales Paris Berlin in November 2006.
Jan Vermeer, The Music Lesson
c. 1662-1665. Oil on canvas, 74.6 x 64.1 cm (Royal Collection, St. James’ Palace, London).
Giovanni Toscani, Trittico con Madonna col Bambino, S. Girolamo e S. Caterina (Firenze, Museo dello Spedale
degli Innocenti).
https://goo.gl/GVJFwj
71. données descriptives de l’objets réalisée
par l’institution dans le cadre de ses
missions
données « crowdsourcées » du grand
publics
artefact
contenus éditoriaux de « médiation »
auprès de X publics
version numérisé de l’artefact 2D
Les données issues de l’Intelligence
artificielle
version numérisé de l’artefact 3D
données issues d’appareils de mesures
réalisées lors de restauration
données structurées issues de
programmes de recherche
contenus éditoriaux (articles, catalogue,
etc.) issus de programme de recherche
Les données de « logs » de consultation
des ces informations (issus de x
sources)
réaliséparAntoineCourtin-20septembre2018-Licencecreativecommons4.0
72. données descriptives de l’objets réalisée
par l’institution dans le cadre de ses
missions
données « crowdsourcées » du grand
publics
artefact
contenus éditoriaux de « médiation »
auprès de X publics
version numérisé de l’artefact 2D
Les données issues de l’Intelligence
artificielle
version numérisé de l’artefact 3D
données issues d’appareils de mesures
réalisées lors de restauration
données structurées issues de
programmes de recherche
contenus éditoriaux (articles, catalogue,
etc.) issus de programme de recherche
Les données de « logs » de consultation
des ces informations (issus de x
sources)
réaliséparAntoineCourtin-20septembre2018-Licencecreativecommons4.0
X artefact
X version numérisé de l’artefact 2D
X version numérisé de l’artefact 3D
73. Pour me retrouver sur le web
Pour me contacter
antoine.courtin@inha.fr
Merci !
#DHNord2019 - Lille - 18 octobre 2019
https://antlitz.ninja/-