This document provides an introduction to persistent identifiers (PIDs) and their use in the EUDAT system. It defines PIDs as globally unique identifiers that can be used to persistently identify digital objects. The document discusses why PIDs are useful, describing problems with URLs like link rot. It then covers different PID systems like Handle and DOI, as well as EUDAT's use of Handle through the B2HANDLE service. The document also discusses PID policies, use cases, and the B2HANDLE Python library for programmatic PID management.
Slides used to introduce the technical aspects of DSpace-CRIS to the technical staff of the Hamburg University of Technology.
Main topics:
The DSpace-CRIS data model: additional entities, interactions with the DSpace data model (authority framework), enhanced metadata, inverse relationship
ORCID integration & technical details: available features & use cases (authentication, authorization, profile claiming, profile synchronization push & pull, registry lookup), configuration, API-KEY, use of the sandbox, metadata mapping
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...FIWARE
NGSI-LD and Smart Data Models: Standard Access to Digital Twin Data - 15 July 2020
Corresponding webinar recording: https://youtu.be/MBx23ypORLk
Understanding the basis of context information management, NGSI-LD and smart Data Models
Chapter: Core
Difficulty: 2
Audience: Any Technical
Speaker: Juanjo Hierro (CTO, FIWARE Foundation), Alberto Abella (Data Modeling Expert and Technical Evangelist, FIWARE Foundation)
Slides used to introduce the technical aspects of DSpace-CRIS to the technical staff of the Hamburg University of Technology.
Main topics:
The DSpace-CRIS data model: additional entities, interactions with the DSpace data model (authority framework), enhanced metadata, inverse relationship
ORCID integration & technical details: available features & use cases (authentication, authorization, profile claiming, profile synchronization push & pull, registry lookup), configuration, API-KEY, use of the sandbox, metadata mapping
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...FIWARE
NGSI-LD and Smart Data Models: Standard Access to Digital Twin Data - 15 July 2020
Corresponding webinar recording: https://youtu.be/MBx23ypORLk
Understanding the basis of context information management, NGSI-LD and smart Data Models
Chapter: Core
Difficulty: 2
Audience: Any Technical
Speaker: Juanjo Hierro (CTO, FIWARE Foundation), Alberto Abella (Data Modeling Expert and Technical Evangelist, FIWARE Foundation)
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
This video illustrates how certified digital repositories contribute to making and keeping research data findable, accessible, interoperable and reusable (FAIR). Trustworthy repositories support Open Access to data, as well as Restricted Access when necessary, and they offer support for metadata, sustainable and interoperable file formats, and persistent identifiers for future citation. Presented by Marjan Grootveld (DANS, OpenAIRE).
Main references
• Core Trust Seal for trustworthy digital repositories: https://www.coretrustseal.org/
• EUDAT FAIR checklist: https://doi.org/10.5281/zenodo.1065991
• European Commission’s Guidelines on FAIR data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• FAIR data principles: www.force11.org/group/fairgroup/fairprinciples
• Overview of metadata standards and tools: https://rdamsc.dcc.ac.uk/
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
This is the presentation for the lecture of Dimitar Mitov "Data Analytics with Dremio" (in Bulgarian), part of OpenFest 2022: https://www.openfest.org/2022/bg/full-schedule-bg/
In this session you will learn how Qlik’s Data Integration platform (formerly Attunity) reduces time to market and time to insights for modern data architectures through real-time automated pipelines for data warehouse and data lake initiatives. Hear how pipeline automation has impacted large financial services organizations ability to rapidly deliver value and see how to build an automated near real-time pipeline to efficiently load and transform data into a Snowflake data warehouse on AWS in under 10 minutes.
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo
This is the first in a series of five webinars that look 'under the covers' of Denodo's industry leading Data Virtualization Platform. The webinar will provide an overview of the architecture and key modules of the Denodo Platform - subsequent webinars in the series will take a deeper look at some of the key modules and capabilities of the platform, including performance, scalability, security, and so on.
More information and FREE registrations to this webinar: http://goo.gl/fLi2bC
To learn more click to this link: http://go.denodo.com/a2a
Join the conversation at #Architect2Architect
Agenda:
The Denodo Platform
Platform Architecture
Key Modules
Connectors
Data Services and APIs
A presentation on Digital Library Architecture (components of digital library) by Rupesh Kumar A, Assistant Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Tumakuru, Karnataka, India.
Business Intelligence (BI) and Data Management Basics amorshed
A one-day training course on the Concepts of Data Management and Business Intelligence (BI) in the DX age
A Basic Review of BI and DM
How to Implement BI
A review of BI Tools and 2022 Gartner Quadrant Magic
Basics of Data warehouse (DWH)
An introductions to Power BI
Components of Power BI
Steps for BI Implementation
Data Culture
Intro to ETL and ELT
OLAP files and Architecture
Digital transformation or DX review
A glance at DMBOK2.0 framework
BI Challenges
Data Governance
Data Integration
Data Security and Privacy in DMBOK2.0
Data-Driven Organization
Data and BI Maturity Model
Traditional BI
Self-service BI
who is DMP
who is BI developer
what is Metadata
what is Master data
Data Quality
Data Literacy
Benefits of BI
BI features
How does BI Works?
Modern BI
Data Analytics
BI Architecture
Data Types
Data Lake
Data Mart
Data Silo
Data Visualization
Power BI Architecture and components
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
The presentation gives an overview of what metadata is and why it is important. It also addresses the benefits that metadata can bring and offers advice and tips on how to produce good quality metadata and, to close, how EUDAT uses metadata in the B2FIND service.
November 2016
DSpace-CRIS slides presented at ORCID's Better Together webinar on 19.09.2019, full slide deck with ORCID introduction at https://doi.org/10.23640/07243.9884033.v2.
Video Recording available at https://vimeo.com/361523018
Persistent Identifiers in EUDAT services| www.eudat.eu | EUDAT
| www.eudat.eu | The EUDAT data domain handles registered data. Each digital object should have a persistent identifier. This persistent identifier is used for: Replica identification; Identification of the repository of record (in the case of replication); Querying of additional information; Checksum (time stamped)...
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
This video illustrates how certified digital repositories contribute to making and keeping research data findable, accessible, interoperable and reusable (FAIR). Trustworthy repositories support Open Access to data, as well as Restricted Access when necessary, and they offer support for metadata, sustainable and interoperable file formats, and persistent identifiers for future citation. Presented by Marjan Grootveld (DANS, OpenAIRE).
Main references
• Core Trust Seal for trustworthy digital repositories: https://www.coretrustseal.org/
• EUDAT FAIR checklist: https://doi.org/10.5281/zenodo.1065991
• European Commission’s Guidelines on FAIR data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• FAIR data principles: www.force11.org/group/fairgroup/fairprinciples
• Overview of metadata standards and tools: https://rdamsc.dcc.ac.uk/
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
This is the presentation for the lecture of Dimitar Mitov "Data Analytics with Dremio" (in Bulgarian), part of OpenFest 2022: https://www.openfest.org/2022/bg/full-schedule-bg/
In this session you will learn how Qlik’s Data Integration platform (formerly Attunity) reduces time to market and time to insights for modern data architectures through real-time automated pipelines for data warehouse and data lake initiatives. Hear how pipeline automation has impacted large financial services organizations ability to rapidly deliver value and see how to build an automated near real-time pipeline to efficiently load and transform data into a Snowflake data warehouse on AWS in under 10 minutes.
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...Denodo
This is the first in a series of five webinars that look 'under the covers' of Denodo's industry leading Data Virtualization Platform. The webinar will provide an overview of the architecture and key modules of the Denodo Platform - subsequent webinars in the series will take a deeper look at some of the key modules and capabilities of the platform, including performance, scalability, security, and so on.
More information and FREE registrations to this webinar: http://goo.gl/fLi2bC
To learn more click to this link: http://go.denodo.com/a2a
Join the conversation at #Architect2Architect
Agenda:
The Denodo Platform
Platform Architecture
Key Modules
Connectors
Data Services and APIs
A presentation on Digital Library Architecture (components of digital library) by Rupesh Kumar A, Assistant Professor, Department of Studies and Research in Library and Information Science, Tumkur University, Tumakuru, Karnataka, India.
Business Intelligence (BI) and Data Management Basics amorshed
A one-day training course on the Concepts of Data Management and Business Intelligence (BI) in the DX age
A Basic Review of BI and DM
How to Implement BI
A review of BI Tools and 2022 Gartner Quadrant Magic
Basics of Data warehouse (DWH)
An introductions to Power BI
Components of Power BI
Steps for BI Implementation
Data Culture
Intro to ETL and ELT
OLAP files and Architecture
Digital transformation or DX review
A glance at DMBOK2.0 framework
BI Challenges
Data Governance
Data Integration
Data Security and Privacy in DMBOK2.0
Data-Driven Organization
Data and BI Maturity Model
Traditional BI
Self-service BI
who is DMP
who is BI developer
what is Metadata
what is Master data
Data Quality
Data Literacy
Benefits of BI
BI features
How does BI Works?
Modern BI
Data Analytics
BI Architecture
Data Types
Data Lake
Data Mart
Data Silo
Data Visualization
Power BI Architecture and components
Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data.
That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads.
Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos.
Here’s what you’ll learn in this 2-hour session:
How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security
How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse
The presentation gives an overview of what metadata is and why it is important. It also addresses the benefits that metadata can bring and offers advice and tips on how to produce good quality metadata and, to close, how EUDAT uses metadata in the B2FIND service.
November 2016
DSpace-CRIS slides presented at ORCID's Better Together webinar on 19.09.2019, full slide deck with ORCID introduction at https://doi.org/10.23640/07243.9884033.v2.
Video Recording available at https://vimeo.com/361523018
Persistent Identifiers in EUDAT services| www.eudat.eu | EUDAT
| www.eudat.eu | The EUDAT data domain handles registered data. Each digital object should have a persistent identifier. This persistent identifier is used for: Replica identification; Identification of the repository of record (in the case of replication); Querying of additional information; Checksum (time stamped)...
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu | EUDAT
| www.eudat.eu | Welcome Overview of the EUDAT service suite and the FAIR principles.
Sarah Jones, Marjan Grootveld, Yann Le Franc - IDCC Conference, February 20, 2017
| www.eudat.eu | Explore the EUDAT services step by step, as presented by Yann Le Franc at the EUDAT workshop "How EUDAT services could support the FAIR principles", IDCC Conference 2017.
Integrating data management planning into institutional processes: a case stu...Joy Davidson
This webinar from April 13, 2015 featured an overview from Zanele Mathe, Cape Peninsula University of Technology (CPUT) about how they tested DMPonline and how data management planning might be integrated into existing CPUT processes. Please note there were a few problems with the sound due to the internet connection.
Presentation slides include audio.
| www.eudat.eu | B2FIND - User training Version 07, June 2017: B2FIND is EUDAT’s simple, user friendly metadata catalogue allowing users to discover metadata from a wide range of scientific communities.
Legal Issues in Research Data Collection and Sharing: An Introduction by EUDA...EUDAT
| www.eudat.eu | v1.0, June 2014 - This course provides guidelines on the collection, usage and sharing of data in research by providing the basic information related to ethical and legal obligations. The course is made up of three modules (further modules will be added in the next months): 1. Intellectual Property Rights. 2. Personal Data. 3. Service Provider Liability & Terms of Service.
Who is it for?: Researchers, Data Managers, General public.
Presentation on legal and ethical issues in open access to research data given at the RECODE early career researcher workshop, University of Sheffield 14-15th May 2015
| www.eudat.eu | B2FIND Integration Version 4 February 2017: The aim of this presentation is to illustrate how metadata can be published in the B2FIND catalogue and how EUDAT’s B2FIND metadata catalogue can be integrated.
EUDAT B2Service Suite| - A new version is available at http://ow.ly/fsCi30grKHVEUDAT
| www.eudat.eu | EUDAT offers a complete set of research data services, expertise and technology solutions to all European scientists and researchers. These shared services and storage resources are distributed across 15 European countries.
January 2017
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
| www.eudat.eu | This webinar was co-organised by DANS, EUDAT and OpenAIRE and was held on 12th and 13th December 2016.
Everybody wants to play FAIR, but how do we put the principles into practice?
There is a growing demand for quality criteria for research datasets. In this webinar we will argue that the DSA (Data Seal of Approval for data repositories) and FAIR principles get as close as possible to giving quality criteria for research data. They do not do this by trying to make value judgements about the content of datasets, but rather by qualifying the fitness for data reuse in an impartial and measurable way. By bringing the ideas of the DSA and FAIR together, we will be able to offer an operationalization that can be implemented in any certified Trustworthy Digital Repository.
In 2014 the FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) were formulated. The well-chosen FAIR acronym is highly attractive: it is one of these ideas that almost automatically get stuck in your mind once you have heard it. In a relatively short term, the FAIR data principles have been adopted by many stakeholder groups, including research funders.
The FAIR principles are remarkably similar to the underlying principles of DSA (2005): the data can be found on the Internet, are accessible (clear rights and licenses), in a usable format, reliable and are identified in a unique and persistent way so that they can be referred to. Essentially, the DSA presents quality criteria for digital repositories, whereas the FAIR principles target individual datasets.
In this webinar the two sets of principles will be discussed and compared and a tangible operationalization will be presented.
Social Media had transitioned from a people-to-people interaction platform to a marketing platform for brands. And then, it went a step further by becoming an integral part of different departments of an organization- like Customer Support, Sales, Human Resources et al.
This brief presentation, given at SEMPO Hyderabad 2014 talks about some of the ways in which organizations are using social media, beyond marketing.
15 argumentos sobre la importância del
Desarrollo Integral de la Primera Infancia
(0 a 8 años)
Javier Sáenz Coré
Curador de Contenidos
Comunidad Virtual de Desarrollo Infantil Temprano
Dijital pazarlama, kullanıcı deneyimi, Markaların dijital dünyadaki yeri ve Modern Kıyametin 4 Atlısı! Sonuçta; Açık dan kapalıya doğru giden Web'i kurtarabilecek miyiz?
Mapping Slacktivism: Patterns of low-threshold civic participation on the int...mysociety
Sandy presented a session at The Impacts of Civic Technology Conference (TICTeC2015) on 25 March 2015 in London.
To see more coverage of TICTeC2015, visit: http://lanyrd.com/2015/tictec/
Data Discoverability and Persistent Identifiers - EUDAT Summer School (Chris...EUDAT
We will introduce the concept of persistent identifiers. We will explain how PIDs can be used, which PID systems exist and which use cases they are fit for. The use cases highlight that PIDs are a vital technology to enable FAIR data. The focus will lie on gathering hands-on experience with the Handle system. Participants will mint PIDs, i.e. not only create a resolvable PID but will also learn how to add, alter and delete metadata in the PID entry by employing the handle API directly and EUDAT’s B2HANDLE library.
Visit: https://www.eudat.eu/eudat-summer-school
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
EUDAT and PRACE joined forces to help research communities gain access to high quality managed e-Infrastructures whose resources can be connected together to enable cross-utilization use cases and make them accessible without any technical barrier. The capability to couple data and compute resources together is considered one of the key factors to accelerate scientific innovation and advance research frontiers. The goal of this session was to present the EUDAT services, the results of the collaboration activity achieved so far and delivers a hands-on on how to write a Data Management Plan or DMP. The DMP is a useful instrument for researchers to reflect on and communicate about the way they will deal with their data. It prompts them to think about how they will generate, analyse and share data during their research project and afterwards.
Visit: https://www.eudat.eu/eudat-summer-school
Leveraging the strength of OSGi to deliver a convergent IoT Ecosystem - O Log...mfrancis
The “internet of things” is the next revolutionary wave following profound changes brought to us by Personal Computers (connecting places) and Mobile Phones (connecting people on the go). This third wave heralds the beginning of the new era of pervasive connectivity, embedded intelligence, and application convergence. It will be the world where smart things will communicate among themselves and with us enabling greener, more efficient, and at the same time more comfortable environment.
This talk will present a platform and products designed to serve the new markets enabled by the Internet of Things, with a particular focus on the value of the OSGi framework enabling convergence of Home Automation, Smart Energy, Electric Vehicle Charging, and e-health on a single remotely manageable platform. It will also provide insights on how the platform was developed leveraging the extensibility offered by the OSGi framework and ProSyst’s modular architecture.
The built-in OSGi stack provides Java-level abstraction of the network interfaces and Smart Energy Profile 2.0 stack as well as cloud integration features such as web server, web services and standards-based remote management. The OSGi framework is the key enabler of the product lifecycle and remote application management mandatory for service provider driven deployments. The Smart Energy 2.0 standard is a key element of the future smart grid. And the work presented in this talk describes the first platform integrating the SEP 2.0 protocol stack with an OSGi based middleware. The OSGi based solution also provides higher level of device security through the use of secure element. The UDK-21 is build around a System-on-Chip STreamPlug (ST2100), the solution features a fully integrated HomePlug PHY/MAC and Analog Front End combined with the ARM926EJ-S processor and a rich set of interfaces.
A demo showing Smart Energy Profile 2.0 use cases will outline these features. The demo will show how web based applications can interact with the OSGi stack on the already publicly available UDK-21 based gateway to control remote devices, such as a thermostat or an electric load. The access to SEP 2.0 devices will be done by the means of JSON-RPC based APIs, independent of the underlying device protocol, hence highlighting the benefits of a generic protocol agnostic architecture from the application standpoint. Other examples of the products that can be built around UDK-21 include Electric Vehicle Charger, Smart Meter, and a Basement Sensor Hub.
An overview of how electronic signature objects are generated and used within PDF documents including the overview of Aodbe LiveCycle ES's ability to programmatically work with them server side.
Introduction to the security components used in FIWARE architecture. What is the standard communication of the oAuth2.0 standard. What about the fine grane access to the information using XACML standard. How to use JWT with FIWARE Secure components. What are the different types of accessing support are allowed. How to offer security access to your applications using these components. What is eIDAS and eID and how to integrate them in the FIWARE Security architecture. Finally an overview of the Data Usage Control using FIWARE Security components
The Role of OAIS Representation Information in the Digital Curation of Crysta...ManjulaPatel
A presentation given by Manjula Patel (UKOLN, University of Bath) and Simon Coles (EPSRC NCS, University of Southampton) at the 5th IEEE International Conference on e-Science
9-11th December 2009, Oxford, UK
presentation at https://researchsoft.github.io/FAIReScience/, FAIReScience 2021 online workshop
virtually co-located with the 17th IEEE International Conference on eScience (eScience 2021)
Cascon Decentralized IoT update - Blockchain and Smart ContractsMehdi Shajari
The pervasiveness of IoT devices makes the delivery mechanism of security updates a challenge. Current IoT systems rely on centralized or brokered paradigms or clouds with huge computational and storage capacities. The existing centralized IoT setups are therefore expensive, owing to the high costs associated with cloud server infrastructures and maintenance, as well as other factors such as network equipment. Thus, the need for a fully decentralized peer to peer and secure technology to overcome these problems rises into the realm of existence. Blockchain provides a solution that fulfills the requirements of such a platform. Ideally, the update infrastructure should implement the CIA triad properties (Confidentiality, Integrity, and Availability). In this article, we study how a blockchain application can meet these requirements and propose a novel system to decentrally distribute digital content in a peer-to-peer network using the blockchain technology and smart contracts to overcome the concerns mentioned above. Additionally, in order to prevent the issues stemming from the free-riding challenge in P2P networks (peers refrain to generously share their resources to distribute updates), we exploit a Nash equilibrium micropayment mechanism to grant adequate incentive for peers to participate in distributing IoT update files.
Monitoring and Securing a Geo-Dispersed Data Center at Hill AFBElasticsearch
The HEDC provides a hosting service for more than 100 information systems supporting the USAF. See how they innovated to deliver logging and DoD compliance monitoring for the life-cycle of hosted information systems as an integrated service within the HEDC PaaS using Elastic Cloud Enterprise.
Integration of Things (Sam Vanhoutte @Iglooconf 2017) Codit
To build an overall IoT solution, a lof of different technologies and skills are needed and the role of an architect is crucial to combine all the different services into a solid solution. In this presentation, you will understand more about the DNA of a typical IoT solution, based on Microsoft Azure. You will see the different pitfalls that come with implementing Industrial IoT solutions.
Similar to Introduction to Persistent Identifiers| www.eudat.eu | (20)
With a network of more than 20 European research
organisations, data and computing centres in 14 countries,
the EUDAT Collaborative Data Infrastructure (CDI) is one of
the largest infrastructures of integrated data services and
resources supporting research in Europe.
Are you a researcher, citizen scientist, institution or community looking for data storage and value-added services? Do you want access to tools to make your research data more FAIR (findable, accessible, interoperable, and reusable)? Interested in seeing how the future European Open Science Cloud could support research data and practically foster cross-border, cross-disciplinary collaboration? Then this webinar is for you!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Introduction to Persistent Identifiers| www.eudat.eu |
1. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Introduction to Persistent
Identifiers
PIDs in EUDAT
Version 2
July 2017
This work is licensed under the Creative
Commons CC-BY 4.0 licence
2. Content
What are persistent identifiers?
Why use persistent identifiers?
Different persistent identifier systems
The HANDLE system
EPIC PID system
B2HANDLE
Policies
Use cases
4. Science Data
Data generation is getting easier/cheaper
Complexity-shift from data generation to data processing &
analysis
The amount of data output is increasing, quality is getting
better
How to stimulate reuse and enable reproducibility?
Data needs to be
ReusableAccessible
Findable
Interoperable
5. Briefly, what are PIDs?
Pointers to data resources
Data files, metadata files, documents …
Globally unique
With infinite lifespan
Can be used to identify and retrieve resources
Can be resolved to the resource
Examples: ISBN, DOIs, PURLs, Handles…
PID Training
7. What is the Problem? Why not use simple
URLs?
The URL specifies the
location, on a
particular server, from
which the resource
could be retrieved.
Strictly network
locations for digital
resources.
domain may change
resource may be
relocated
link may change
B2SAFE Training
BUT
URLs a year or two later, often no longer workin the long term
In the long term URLs a year later, often no longer work
“link rot”
8. Persistent over time
today … ... ... 2030
11839/abc123 11839/abc123
11100
00100
01111
11100
00100
01111
http://www.example.com/ http://www.moved.com/
Supports access to resource as it moves from one location to another.
.. by design
9. Why can Persistent Identifiers help?
A Persistent Identifier is
distinct from a URL
not strictly bound to a specific server or filename
“A persistent identifier (PID) is a long-
lasting reference to a digital object—a
single file or set of files.“
https://en.wikipedia.org/wiki/Persistent_identifier
11100
00100
01111
11839 / abc123
resolution
prefix suffix
Identifier points to a resource with no actual
knowledge of the resource
Responsibility of the PID owner to keep it up-to-date
when the resource changes
10. Structure of a Persistent Identifier
points to a resource Is globally unique
11100
00100
01111
11839 / abc123
resolution
prefix sufffix
11839 / abc123
prefix sufffix
Once the PID is created, the
resource is globally addressable.
Data
Metadata
Document
Code
Prefix: designates administrative
domain, comes from an issuing
instance
Suffix: unique in the realm of the
prefix
11. Persistent over time
today … ... 2030
11839/abc123 11839/abc123
11100
00100
01111
11100
00100
01111
http://www.example.com/ http://www.moved.com/
.. by design
Update information
Redirection
Stable
12. PID Benefits
Persistent Identity via Indirection
Static references into fluid systems over time
Data on networks moves
Ownership/responsibility change
Formats change
Embedded IDs
For data object in hand – current state data
Updates
New related entities
Networks of Persistent Links
Data / metadata links
Provenance chains
13. PID Costs
Extra level of effort / cost on creation
Analysis – what to identify (granularity)
Folders, files
Single measurements in a time series experiment
Coordination across organisations
Maintain resolution system
Persistence requires sustained effort
Organisational discipline
Technology necessary but not sufficient
Analyse cost/benefit ratio
Don’t start unless it is worthwhile
Is your data worth it?
15. Persistent Identifier structure
Every persistent identifier consists of two parts: its prefix and
a unique local name under the prefix known as its suffix
Prefix - designates administrative domain, is generated by
an issuer, which makes sure that all prefixes are unique
Suffix - local name must be unique under its prefix.
The uniqueness of a prefix and the local name under that
prefix ensure that any identifier is globally unique within the
context of the System.
PID Training
< PREFIX > / < SUFFIX >
(e.g. 11111/123456745)
16. PID Systems
Persistent URLs (PURLs)a
Cost: no
Metadata: No additional metadata
purl: GPO/gpo46189
EPIC Systemb
Cost: $50 annual fee per prefix
Metadata: Associate any metadata
hdl:11210/123
Digital Object Identifier (DOI)d
Cost: fee per DOI + annual fee
Metadata: The INDECS schema,
stored in separate
database
DOI: 10.1000/182
Archival Resource Key (ARK)c
Cost: no
Metadata: ERC (Electronic Resource
Citation) metadata
ark: /12025/654xz321
Based on: Handle System
17. PID system Requirements
Attach multiple URLs to a PID
Allow part identifiers for complex
objects. Granularity issue
Allow attaching of extra metadata
to the PID (MD5 check, etc)
Actionable (i.e. converted to URL)
PIDs
HTTP proxy for resolving (use port
80 only)
Controlled by community
Programmable interface for
administration of PIDs from
applications
Delegation of PID administration
to other organisations
Distributed, robust, highly-
available, scalable
No single-point of failure,
distributed system with mirroring
Acceptable non-commercial
business model
PID Training
18. Identifier String Requirements
Not based on any
changeable attributes of
the entity, e.g.:
Location
Ownership
Any other attribute that
may change without
changing identity
Unique
Avoid conflicts and
referential uncertainty
A good PID system should
not allow you to use the
same suffix twice
Opaque, preferably a
“dumb number”
A well known pattern
invites assumptions that
may be misleading
Meaningful semantics
invite IP wars, language
problems
Nice to have
Human-readable
Cut-able, paste-able
Fits common systems, e.g.
URI specification
PID Training
that contribute to persistence
19. PIDs in EUDAT
EUDAT has adopted
Handle-based persistent
identifiers
A combined solution of
Handle system and EPIC
service
Employing the latest Handle
v.8
EUDAT developed a library
to interact with Handle v.8
B2HANDLE
PID Training
21. The Handle System
The Handle System is a technology specification for
assigning, managing, and resolving persistent
identifiers for digital objects and other resources.
The protocols specified enable a distributed computer
system to store identifiers (names, known as Handles)
of digital resources and resolve those Handles to the
information necessary to locate, access, and otherwise
make use of the resources.
That information can be changed as needed to reflect
the current state or location of the identified
resource without changing the Handle.
PID Training
22. The Handle System
The main goal of the Handle system is to contribute to
persistence.
The Handle system is:
reliable
scalable
flexible
trusted
built on open architecture
transparent
PID Training
23. A Handle Record
Handle Data
Type/KEY
Index Handle data Timestamp
10232/1234 URL 1 https://www.eudat.eu/ex1 2014-04-
09 12:46:53Z
DOMAIN 2 EUDAT 2014-04-
09 12:46:53Z
HS_ADMIN 100 eudat/user1 2014-04-
09 12:46:53Z
PID Training
PID – handle: 10232/1234
Actionable PID (URL/resolving): http://hdl.handle.net/10232/1234
24. Resolving Handle Record
PID Training
Global Registry
E.g. Handle
system
3. Client gets request
to resolve hdl:10232/1234
1. Client sends request to Global to resolve
0.NA/10232 (prefix handle for 10232/1234)
2. Global Responds with Service
Information for 10232
#1
#1
#2
#3
Secondary Site A
Secondary Site B
Local Service
#1 #2
Primary Site
4. Server responds with
handle data
Service Information
Local Handle Service
IP xc xc xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
xc
..
..
..
xc
xc
xc
..
..
..
xc
xc
xc
..
..
..
...
xcccxv
xccx
xccx
xcccxv
xccx
xccx
xcccxv
xccx
xccx
25. HANDLE Record Types
Common types
URL: the location the
HANDLE should resolve to
HS_ADMIN: special record
encoding the permissions
configured for this HANDLE
10320/LOC: supports
multiple locations based on
intelligent decision.
Custom EUDAT types
EUDAT/CHECKSUM: Useful for
integrity verification
EUDAT/ROR: Repository of
Records, the ID of the community
repository.
EUDAT/FIO: PID to first ingested
object in the EUDAT domain.
EUDAT/PARENT: PID associated
with the source object in a
replication chain.
EUDAT/REPLICA: List of PIDs
pointing to replicas
27. PID System: How does it work?
PID Service
generate and manage
PIDs for digital objects
according to policies
Example: B2HANDLE
python library (next
section)
PID Training
PID Replication (as one
of the EPIC policies)
replicate the database of
Handles to partners in
EPIC to guarantee robust
and highly available PID
resolution
Resolution Service
EPIC uses the distributed
network provided by
Handle and extends it
with own local Handle
servers.
Global Handle Mirror
A mirror of the Global
Handle in Europe
Handle system and EPIC
28. Resolution Service
The web address for the Handle resolution service that
EUDAT uses is http://hdl.handle.net.
PID Training
29. EUDAT options for PIDs
In order to access a data object stored in EUDAT, an
associated persistent identifier is needed.
EUDAT requires integration of Handle in your
infrastructure. Before your community or data centre can
create PIDs you need a prefix. There are two options:
you can run your own Handle system; or
you can pass the details to EUDAT partners to
manage it on your behalf.
additional benefit of using the EUDAT systems is
access to a Python library to manage your PID
Handles
PID Training
31. What is B2HANDLE?
B2HANDLE is EUDAT’s PID service based on
Handle as technology
EPIC as federation
B2HANDLE offers:
Assignment of prefix via one of the EUDAT partners
Hosting of PIDs, i.e. operation and maintenance of Handle
servers and technical services
Replication and safe-keeping of PIDs via the EPIC
federation
Resolution mechanism based on Handle
Easy maintenance and programmatic resolving of PIDs by
the B2HANDLE Python library for general interaction with
Handle servers
PID Training
32. B2HANDLE in other EUDAT services
In the EUDAT ecosystem, EUDAT services make use
of B2HANDLE to:
guarantee data access
provide long lasting references to data and
facilitate data publishing.
PID Training
B2SAFE and B2SHARE use the service to create and
manage PIDs for their hosted data objects.
B2FIND and B2STAGE use the resolving mechanism of
B2HANDLE to retrieve and refer to objects.
33. The B2HANDLE Python library
b2handle: A Python library for interaction with EUDAT
Handle services (Handle version 8)
Setup tools-enabled Python package easy
installation
Can be employed by end-users to programmatically
resolve handles
Credentials to one of the EUDAT Handle servers are
required for creation and maintenance of PIDs
Stable state; official release of v1.0 also for use by
EUDAT user communities
36. B2HANDLE library features
Methods to read, create, modify Handles and their
records
Queries against native Handle REST interface
Support for multiple locations per object (10320/loc
entries)
Automatic management of Handle value indexes
Support for Handle reverse-lookup via additional Java
servlet
Support for resolving any Handle from any issuing
instance
38. How may I use a PID
When you have a PID use it:
To cite the data behind the PID:
In publications
On web-pages
Include actionable PIDs in linked data
Retrieve the data:
By using the corresponding resolver
Via the actionable PID
E.g. http://hdl.handle.net/11239/GRNET
PID Training
39. Policy Document
When to use Persistent Identifiers?
What should the PID resolve to?
There is no “one-size fits all” strategy for
implementing PIDs!
Create a Policy Document of What & When
Analyze the use of PIDs, create a policy for the
management
What to register
When it enters the data management life cycle
PID Training
analysis and thought
40. Policy Document
Simple Questions
Which data objects need a PID (collections, files, metadata
records)?
What kinds of data are likely to stay online long enough?
What kinds of data are likely to be linked to your PIDs?
What kinds of data are likely to be analysed/processed with
tools?
What will happen after data goes off-line?
etc..
PID Training
analysis and thought
41. PID Policies for EUDAT services
Each Service follows its own Policy for managing PIDs.
One of the main policies they all follow is the non-
deletion policy:
Once a PID is generated it is not allowed to delete
it.
E.g. B2SAFE and B2SHARE use the service to
create and manage PIDs for their hosted data
objects. They both create their own PID types (Keys)
in the PID record.
PID Training
43. Example 1: B2SHARE
The persistent identifier for
files, download single files
Cite the whole data publication
44. Example 2: B2SAFE
B2SAFE employs PIDs to keep track and link replicas of
data in the EUDAT network
45. Example 3: Enable data flows
PID Training
Link directly to the data (?locatt=id:n)
Optionally include a (mime) type in the Handle record –
can be used to select appropriate tooling
46. Summary
Persistent Identifiers provide a solution to the “link rot”
problem by providing an extra layer of indirection
Several systems are available with different conditions
PIDs do it yourself: Use a Policy Document
The HANDLE system - via EPIC policies - is the foundation
for EUDAT’s B2HANDLE service:
Low cost, only a flat annual fee
Robust, scalable and performing
Flexible, allows addition of any metadata
Provides a global resolver
48. www.eudat.eu
Authors Contributors
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Themis Zamani, GRNET
Willem Elbers, CLARIN
Christine Staiger, SURFsara
Ellen Leenarts, DANS
Kostas Kavoussanakis, EPCC
Thank you
Editor's Notes
Data generation is getting easier/cheaper.
At the same time there is a shift from data generation to data processing & analysis. A new way to do science.
As a result the number of data output is increasing. A new data world with science data.
One of the grand challenges of science data is to facilitate knowledge discovery by assisting humans and machines in data access, integration and analysis.
So as to make the data world a better place for science we must have some data principles in mind.
The idea of these data principles is:
Data should be Findable
Findable – Easy to find by both humans and computer systems Metadata
Data should be Accessible
Accessible – Stored for long term, accessed and/or downloaded with well-defined license and access
Data should be Interoperable
Interoperable – Ready to be combined with other datasets by humans as well as computer systems;
Data should be Re-usable.
Reusable – Ready to be used for future research and to be processed further using computational methods.
PIDs can help to identify and locate data. In some cases PID systems can also be used for keeping metadata that is vital for making data interoperable on the technical side.
What is actually the problem we are trying to solve? For this we have to look at the data creation process.
This life cycle applies to any online digital object.
analysis & enrichment. Data analysis is the process of applying data analysis techniques to a large amount of data, typically in big data repositories. It uses specialized algorithms, systems and processes to review, analyze and present information in a form that is more meaningful for organizations or end users. At the same time data enrichment is a general term that refers to processes used to enhance, refine or otherwise improve raw data. This idea and other similar concepts contribute to making data a valuable asset. Some examples are:
interpret data
derive data
produce research outputs
author publications
prepare data for preservation
registration & preservation: Data preservation, or more specifically, digital data preservation, refers to the series of managed activities necessary to ensure continued access to digital objects for as long as necessary. Long-term preservation can be defined as the ability to provide continued access to digital objects, or at least to the information contained in them, indefinitely.
All theses steps produce:
Temporary data
Referable data
Citable data
Identifiers of the kind we are discussing are themselves digital objects; so they are subject to the same life cycle.
PIDs can help to keep track of generated data and its relations.
Let’s see the use of Data URL as a means to find and access data.
The URL specifies the location, on a particular server, from which the resource could be retrieved. Strictly network locations for digital resources.
Are URLs persistent?
Suppose you want to publish online your research outputs. The transitional way to store data is to upload to a site, a repository, a directory. In order to access it you bookmark or share this URL. So you
Publish it online at some address http://www.test.com/test.html.
Other users may cite, access, re-use this url
As long as nothing changes about the way the data is accessed, this works fine. But one day you decide to move the resource to another location. So
relocate the resource at http://www.example.com/
Other users are not informed about this relocation and when they are trying to access the resource - at the first location – they always get a Page Not Found response.
Apart from this administrative change, you may have experienced:
domain may change (like the example)
resource may be relocated: The directory structure is rearranged: subdirectories are created for each collection.
link may change: The researcher decided to use a different platform, with different url queries to retrieve the resource.
You will always get “a Page Not Found” response.
In the long term however, URLs a year later, often no longer work. So this arrangement has proven to be fragile.
We could say that the current need for persistent identifiers came out.
This is what we want: a string that always resolves to the data.
Persistent over time. By design. Even if the real location of the data changes.
A Persistent Identifier is
distinct from a URL
not strictly bound to a specific server or filename.
An identifier is a unique name, identity applied to a digital object so that this object can be easily referenced. It is a reference to the digital object.
“A persistent identifier (PI) is a long-lasting reference to a digital object—a single file or set of files. “
The identifier points to a resource with no actual knowledge of the resource. It is the Responsibility of the owner to keep the PID up-to-date when the resource changes.
We are going to talk about it in our examples.
points to a resource
Identifier points to a resource, with no actual knowledge of the resource.
The resource is a black box. The type of the URL doesn't matter. It may be a file, a metadata record, a code collection.
Is globally unique
You won’t find an identifier with the same name that points to another resource. The system ensures that by design.
Once it is created, the resource is globally addressable.
PIDs solve these problems by introducing a “Redirection layer”
The user has an opaque string, which is resolved to a URL.
PID points to a URL which points to the digital object.
If the digital object is moved, its ownership is changed, or the organization of the objects is changed, the URL is often changed. With PIDs you can easily update this information, while the user can still employ the PID to refer and retrieve the data.
PIDs make use of a redirection layer bridging the stable and unstable worlds at the cost of some administrative responsibilities. The PID can be updated to point to the new URL.
PIDs introduce a stable layer of redirection on top of more unstable identifiers such as URLs.
PIDs provide a layer of redirection
PID points to a URL
The URL is unstable
The PID is stable
Update procedures need to be defined and thought through to enable stable referencing.
Static references into fluid systems over time
Data = digital object. A digital object may be moved, removed or renamed for many reasons.
It can move to other servers or even to other organisations.
It may even be changed to another format. (ex from xls to zip) .
Persistent identifiers are there to continue to provide access to this resource, so the digital object gets a Static reference into fluid systems over time.
Embedded IDs
Apart from the main information (ID, location of the object), a number of related data could be stored in the PID record.
Some ideas of these IDs are a) the version of the item or b) new related items
This means that the user always knows the current and latest state of the data
Networks of Persistent Links
Persistent identifiers may contain info about other digital objects (Data / Metadata links). This may create a chain of links between digital objects.
There are some costs when using PIDs
Extra level of effort / cost on creation: when you decide to use PIDs, data and PIDs are strictly connected. In the management life cycle you must include a new task "managing the persistent identifier for the data”.
Analysis – what to identify / granularity: Analyse the need for PID in your system. Which digital objects must have a PID? Do you need to assign a PID to all your files?
Coordination across organisations: Often, an institution already cooperates with other institutions that deal with a similar environment. So a coordination across organisations is needed
Maintain resolution system: There is a cost for maintaining a resolution system
Persistence requires sustained effort
Organisational discipline: An Organisational discipline must be followed to achieve persistence
Technology necessary but not sufficient.
Analyse cost/benefit ratio. Checklist and questions are mentioned in this presentation.
Don’t start unless it is worthwhile
Is your data worth it?
Let’s start by looking at the persistent identifier string.
Every identifier consists of two parts: its prefix and a unique local name under the prefix known as its suffix
Any suffix - local name must be unique under its local namespace. The uniqueness of a prefix and a local name under that prefix ensures that any identifier is globally unique within the context of the System.
Lets see some popular persistent identifier systems
PURLs
PURLs s are URLs which redirect to the location of the requested web resource using standard HTTP status codes. A PURL is thus a permanent web address which contains the command to redirect to another page, one which can change over time.
(Persistent Uniform Resource Locators) are Web addresses that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. Instead of resolving directly to Web resources, PURLs provide a level of indirection that allows the underlying Web addresses of resources to change over time without negatively affecting systems that depend on them. This capability provides continuity of references to network resources that may migrate from machine to machine for business, social or technical reasons.
Cost: no
Metadata: Does not support additional metadata
Handle System
The Handle System is a technology specification for assigning, managing, and resolving persistent identifiers for digital objects and other resources on the Internet.
Cost: $50 annual fee per prefix
Metadata: Associate any metadata
ARK
is an identifier scheme conceived by the California Digital Library (CDL), aiming to identify objects in a persistent way. The scheme was designed on the basis that persistence "is purely a matter of service and is neither inherent in an object nor conferred on it by a particular naming syntax". An Archival Resource Key (ARK) is a Uniform Resource Locator (URL) that is a multi-purpose identifier for information objects of any type. An ARK contains the label ark: after the URL's hostname.
Cost: no
Metadata: ERC (Electronic Resource Citation) metadata
DOI
The Digital Object Identifier (DOI) was conceived as a generic framework for managing identification of content over digital networks, recognising the trend towards digital convergence and multimedia availability. A DOI name is an identifier (not a location) of an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. It is based on the Handle Server for the resolution service.
Cost: fee per DOI + annual fee
Metadata: The INDECS schema
(as per slide)
(as per slide)
As we have already mentioned, a persistent identifier is a long-lasting reference to a digital object. EUDAT data domain handles registered data and each digital object should have a persistent identifier.
This persistent identifier is used for
- Replica identification
- Identification of the repository of record (in the case of replication)
- Querying of additional information
- Checksum (time stamped)
A persistent Identifier helps you
- access
- use and re-use
- verify
your data
EUDAT has adopted Handle-based persistent identifiers
A combined solution of handle system and EPIC service
So let’s discuss the Handle System
EUDAT has adopted the Handle system and EPIC. We will now dive into the technical details on Handle.
(as per slide)
The Handle system is:
reliable (using redundancy, no single points of failure, and fast enough to not appear broken);
scalable (higher loads simply managed with more computers);
flexible (can adapt to changing computing environments; useful to new applications):
trusted (both resolution and administration have technical trust methods; an operating organization is committed to the long term);
builds on open architecture (benefits from effort of a community in building applications on the infrastructure);
transparent (users need not know the infrastructure details).
PIDs such as Handles, are actually records in a database.
One can convert the PID to a URL by appending it to the address of the resolver.
When creating a Handle, the fields URL and HS_ADMIN are mandatory.
The URL field is the actual location of the data and, when it’s an HTTP URL, clicking (resolving) the Actionable PID will redirect to what is written in the URL field.
Each Handle may have a set of other values assigned, like in this example, the field “DOMAIN”, which denotes the domain where the data lives and who is responsible for the data. This is defined by the community or the domain the handle belongs το.
(It is just an example not a real PID key that is used by EUDAT!)
These Handle values use a common data structure for their data. For example, each Handle value has a unique index number that distinguishes it from other values in the value set. They also have a specific data type (or Key) that defines the syntax and semantics of the data in its data field.
Besides these, each handle value contains a set of administrative information such as TTL and permissions.
How does the resolving of a Handle work when you click on an actionable PID?
For any HTTP request, for example
http://hdl.handle.net/10232/1234
one of the proxy servers will query for the Handle, take the URL in the Handle record (or if there are multiple URLs in the Handle record it will select one, and that selection is in no particular order) and send an HTTP redirect to that URL to the user's web browser. If there is no URL value, the proxy will display the handle record.
Now let us inspect some data types or keys that are 1) predefined or 2) service or user specific.
The Handle record supports a number of record types.
Every Handle value must have a data type specified in its <type> field.
The <type> field identifies the data type that defines the syntax and semantics of data in the next <data> field. The data type may be registered with the Handle System to avoid potential conflicts.
Each field in a Handle record is timestamped.
Some common types are:
URL: one location referenced by this HANDLE
HS_ADMIN: special record encoding the permissions configured for this HANDLE. Each handle has one or more administrators. Any administrative operation (e.g., add, delete or modify handle values) can only be performed by the Handle administrator with adequate privilege. Handle administrators are defined in terms of HS_ADMIN values.
10320/LOC: supports multiple locations based on intelligent decision.
Some custom types used by EUDAT are:
Checksum: Useful for integrity verification
EUDAT/ROR: EUDAT specific for B2SAFE. ROR: (Repository of Records), the repository where data was stored first.
EUDAT/PPID: EUDAT specific for B2SAFE. the PID associated to the source object in a replication chain. If the chain has only two elements, the master copy and the first replica, then the PPID = ROR.
Anything else you like
What is it that EUDAT uses exactly and how?
As we have already mentioned, EUDAT supports a combined solution of Handle system and EPIC service, i.e. not all PIDs in EUDAT are maintained in the EPIC federation, but all PIDs in EUDAT are Handles and can be supported by the same hardware and software.
We had a closer look at the Handle system and what it provides technically. We will see now what EPIC adds to the pure technology.
ePIC provides many highly reliable, redundant and performant services to the scientific research community.
PID Service: The first service which is publicly visible is the PID service. The PID Service is the main interface to register and manage persistent identifiers in ePIC . It is implemented as a RESTful web service and it is continuously being developed by ePIC. We will see how to use the PID service in the section B2HANDLE.
Resolution: It is responsible for forwarding the user to the current location of the object (such as author or expiration date). ePIC utilizes the Handle System to achieve a redundant and load-balanced setup between the data centers.
Replication: Currently, five European data centers work together to replicate each other’s persistent identifiers. When a data center is temporarily not available, the other ePIC centers still resolve the PIDs.
Mirror: The ePIC system is based on a worldwide hierarchy with the Global Handle Systems on the top of it .These systems are registries where the most important information of the prefixes is stored. One of EPIC founders (GWDG) runs a mirror so as to assure the resolution of prefixes in Europe, even if other parts of the global network is temporarily not available.
Most of the them are hidden, except from the one responsible for PID registration, which is publicly visible.
Based on the Handle resolution mechanism we saw earlier.
Resolution: It is responsible for forwarding the user to the current location of the object.
The PID Resolution system of ePIC is responsible for forwarding the users to the current location of an identified object. In addition to the current location, other information about the object (such as author or expiration date) can also be provided. ePIC utilizes the Handle System to achieve a redundant and load-balanced setup between the data centers. ePIC replicates the PID databases to guarantee high availability of the PID resolution. The resolution services of ePIC are also included into the worldwide Handle infrastructure to guarantee a highly reliable and performant resolution of PIDs issued by ePIC .
(as per slide)
Now we will dive into the subject, what do you need to arrange for on the organisational side, when you want to use PIDs in your project or at your institute.
When you are a user you can use any PID
For publishing and referencing data in your papers and on web pages
You can include them as online resource in linked data
You can fetch the data by resolving the PID via the resolver or via an actionable PID
Now that I know how to use a PID when should I create – mint a PID?
“When to use persistent identifiers”?
It should be noted that among all the concepts which have been introduced there is no ‘one size fits all’ strategy for implementing persistent identifiers. Although the basic problems to be solved are the same, each of the systems addresses them in its own way on different administrative and technical levels. It is not possible to formulate one single recommendation for all.
Create a Policy Document of What & When
Analyze the use of PIDs, create a policy for the management
What to register
When it the data management life cycle
Determining
First of all, one must carefully analyse its current use of identifiers in general. In most cases, where data is collected, it is identified in some way. If it is data about other data or objects – metadata – it will often contain an identifier for the referred item.
Answer these simple questions
Which data objects need a PID (collections, files, metadata records)?
What kinds of data are likely to stay online long enough?
What kinds of data are likely to be linked to your PIDs?
What kinds of data are likely to be analysed/processed with tools?
What will happen after data goes off-line?
Lets see a few use cases .
B2SHARE is a user-friendly, reliable and trustworthy way for researchers, scientific communities and citizen scientists to store and share small-scale research data from diverse contexts. All B2SHARE artifacts are associated with a PID.
B2SHARE creates several PIDs:
1) A Handle for the deposit, that resolves to the specific landing page in B2SHARE
A DOI, that also resolves to the landing page but is also indexed in DataCite
2) A DOI for each uploaded file, that resolves directly to the file and thus enables programmatic and specific downloads of certain data.
B2SAFE creates a PID using its own Handle prefix that refers to the original in the community’s repository
B2SAFE uses B2HANDLE, i.e. it interacts with the Handle server via the B2HANDLE python library. Upon creation of the PID, B2SAFE also stores some extra information in the Handle entry:
The URL pointing to the real location of the data object
Checksum for integrity checks
An optional identifier that is internally used by the community
The data object is replicated to an EUDAT site
Here B2SAFE creates a new PID for the replica, with the Handle prefix for that site
The interaction is again done with the B2HANDLE python library
The Handle entry of the replica uses the PID from the community centre to refer to the original data, here PID1x is used as value for the First ingested Object (FIO), as direct parent of the replica (PARENT) and also as link to the original community repository (ROR). At the same time the PID of the data at the Community centre is updated. It holds now a link to the replica (REPLICA)
For Handles with multiple URL values, the proxy server (or web browser plug-in) simply selects the first URL value in the list of values returned by the Handle resolution. Because the order of that list is non-deterministic, there is no intelligent selection of a URL to which the client would be redirected. The 10320/loc Handle attribute was developed to improve the selection of specific resource URLs. Type “10320/loc” specifies an XML-formatted Handle value that contains a list of locations.
Locatt attribute:
If someone constructs a link as hdl:123/456?locatt=id:0 then the resolver will return the locations that have an "id" attribute of 0 (i.e., the first location).
If there is only one location element, it is returned as a redirect. If there are more than one, then you can select the one you want by adding the ?locatt=id:n attribute.
Persistent Identifiers provide a solution to the “link rot” problem by providing a layer of indirection. PIDs act as individual names for the objects with some extra information, like an ID card for people containing the address and some more information.
The indirection helps to account for changes e.g. the address changes but users can access the data while still employing the same reference.
PIDs also make it easier to keep digital objects accessible in the long term, despite changes of technology, organizations, and people, as they are independent of the data and their changes in publishing systems, transfer, and evolution of technology.
Requirements: An identifier must be globally unique, and should also be actionable - that is, a persistent identifier should provide a persistent link to the resource identified.
Several systems are available; some offer additional functionality in the form of support for storing additional metadata, providing a global resolver, etc.
Policy Document: How to use persistent identifiers in your repository requires some analysis and thought Persistence Needs Preservation. Huge amounts of useful material have been lost due to lack of resources, or explicitly assigned responsibility. There must be a policy in place to prevent this from happening. Among all the concepts which have been introduced there is no ‘one size fits all’ strategy for implementing persistent identifiers. A policy will have to be adapted to suit the needs of the individual organization
EUDAT uses Handle as technology low costs, robust setup, allows for flexible creation of PIDs
Via the EPIC federation EUDAT makes sure that PIDs are mirrored and kept safe