by D. Giaretta (APARSEN), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
Hadoop was born out of the need to process Big Data.Today data is being generated liked never before and it is becoming difficult to store and process this enormous volume and large variety of data, In order to cope this Big Data technology comes in.Today Hadoop software stack is go-to framework for large scale,data intensive storage and compute solution for Big Data Analytics Applications.The beauty of Hadoop is that it is designed to process large volume of data in clustered commodity computers work in parallel.Distributing the data that is too large across the nodes in clusters solves the problem of having too large data sets to be processed onto the single machine.
Persistent Identifiers (PiDs) for research – why we have them, why there are so many PiD systems, how they work looking at a few examples (Handles, DOIs, ORCIDs), how to choose one, can PiD systems fail and what’s happening in the international PiD community
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
Hadoop was born out of the need to process Big Data.Today data is being generated liked never before and it is becoming difficult to store and process this enormous volume and large variety of data, In order to cope this Big Data technology comes in.Today Hadoop software stack is go-to framework for large scale,data intensive storage and compute solution for Big Data Analytics Applications.The beauty of Hadoop is that it is designed to process large volume of data in clustered commodity computers work in parallel.Distributing the data that is too large across the nodes in clusters solves the problem of having too large data sets to be processed onto the single machine.
Persistent Identifiers (PiDs) for research – why we have them, why there are so many PiD systems, how they work looking at a few examples (Handles, DOIs, ORCIDs), how to choose one, can PiD systems fail and what’s happening in the international PiD community
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the Advanced Information Systems module of the MSc in Library and Library Management, University of the West of England Frenchay Campus, Bristol, October 24th, 2006
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
In this deck from the Swiss HPC Conference, Robert Triendly from DDN presents: Long Live Posix - HPC Storage and the HPC Datacenter.
"The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. Since it was developed over 30 years ago, storage has changed dramatically. To improve the IO performance of applications, many users have called for the relaxation in POSIX IO that could lead to the development of new storage mechanisms to improve not only application performance but management, reliability, portability, and scalability."
Watch the video: https://wp.me/p3RLHQ-kaR
Learn more: http://ddn.com
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Candid/blunt AWS advice for research IT and life science IT leadership. Hard lessons learned from many years of AWS consulting. Contact dag@bioteam.net if you want a PDF copy of this presentation
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
DIACHRON Preservation: Evolution Management for PreservationPRELIDA Project
by Giorgos Flouris (FORTH), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the Advanced Information Systems module of the MSc in Library and Library Management, University of the West of England Frenchay Campus, Bristol, October 24th, 2006
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
In this deck from the Swiss HPC Conference, Robert Triendly from DDN presents: Long Live Posix - HPC Storage and the HPC Datacenter.
"The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. Since it was developed over 30 years ago, storage has changed dramatically. To improve the IO performance of applications, many users have called for the relaxation in POSIX IO that could lead to the development of new storage mechanisms to improve not only application performance but management, reliability, portability, and scalability."
Watch the video: https://wp.me/p3RLHQ-kaR
Learn more: http://ddn.com
and
http://hpcadvisorycouncil.com/events/2019/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Candid/blunt AWS advice for research IT and life science IT leadership. Hard lessons learned from many years of AWS consulting. Contact dag@bioteam.net if you want a PDF copy of this presentation
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
DIACHRON Preservation: Evolution Management for PreservationPRELIDA Project
by Giorgos Flouris (FORTH), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
Organizational and Economic Issues in Linked Data PreservationPRELIDA Project
by Jose Maria Garcia (UIBK/STI Innsbruck), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
Preserving linked data: sustainability and organizational infrastructurePRELIDA Project
by Mariella Guercio (Sapienza Università di Roma), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
Brief Introduction to Digital PreservationMichael Day
Presentation slides from a lecture given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 10, 2010
Big Data brings big promise and also big challenges, the primary and most important one being the ability to deliver Value to business stakeholders who are not data scientists!
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
High value analytics in FS are being enabled by Graph, machine learning and Spark technologies. To make these real at production scale HPC technologies are more appropriate than commodity clusters.
Watch full webinar here: https://bit.ly/2Y0vudM
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Register to attend this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise?
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
Innovation med big data – chr. hansens erfaringerMicrosoft
Mange steder er Big Data stadig det nye og ukendte, der ikke har topprioritet hos IT, da ”vi ikke har store datamængder”. Men Big Data er meget mere end store datamængder. I Chr. Hansen A/S har Forskning og Udvikling (Innovation) afdelingen arbejdet med værdien af data og som resultat etableret et tværfagligt BioInformatik-program på Big Data teknologier fra Microsoft.
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupScott Mitchell
This presentation was presented at the July 8th 2014 user group meeting for BI Reporting for Bay Area Start Ups
Content - Creation Infocepts/DWApplications
Presented by: Scott Mitchell - DWApplications
Data Virtualization to Survive a Multi and Hybrid Cloud WorldDenodo
Watch full webinar here:https://buff.ly/2Edqlpo
Hybrid cloud computing is slowing becoming the standard for businesses. The transition to hybrid can be challenging depending on the environment and the needs of the business. A successful move will involve using the right technology and seeking the right help. At the same time, multi-cloud strategies are on the rise. More enterprise organizations than ever before are analyzing their current technology portfolio and defining a cloud strategy that encompasses multiple cloud platforms to suit specific app workloads, and move those workloads as they see fit.
In this session, you will learn:
*Key challenges of migration to the cloud in a complex data landscape
*How data virtualization can help build a data driven, multi-location cloud architecture for real time integration
*How customers are taking advantage of data virtualization to save time and costs with limited resources
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
Talking about the ease of use and handling Big Data technologies in the Cloud. Using Google Cloud Platform and Amazon Web Services and all of the tools around it.
Showing the problems and how we can solve them with simple tools.
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
Watch full webinar here: https://bit.ly/3cUA0Qi
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
Watch full webinar here: https://bit.ly/34iCruM
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
CEDAR: From Fragment to Fabric - Dutch Census Data in a Web of Global Cultura...PRELIDA Project
by Ashkan Ashkpour, Albert Meroño-Peñuela, Christophe Gueret (http://cedar-project.nl/), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
by Sławek Staworko, (joint work with Peter Buneman), University of Edinburgh, presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
by Mark Williams (Department of Film and Media Studies
Dartmouth College), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project
Peter Burnhill (EDINA, University of Edinburgh), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataPRELIDA Project
by Albert Meroño, presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
by Yannis Stavrakas (“Athena” Research Center
), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
by Sotiris Batsakis & Grigoris Antoniou, presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
Introduction to PRELIDA Consolidation and Dissemination WorkshopPRELIDA Project
by Carlo Meghini (ISTI CNR, Pisa), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
D3.1 State of the art assessment on Linked Data and Digital PreservationPRELIDA Project
The presentation was given by René van Horik from Data Archiving & Networked Services, The Netherlands, at the PRELIDA Midterm Workshop in Catania, April 2014.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
3. EC policy – a brief
history – a personal view
EC support for
DP research
for creating digital objects
Data
Digitisation
e-Infrastructure
to
Digital Agenda
National funding
Significantly more than EC funding
What is the EC role?
4. DP research: approx
100M€ from EC
From Research on Digital Preservation within projects co-funded by the European
Union in the ICT programme, 2011, Stephan Strodl et al
http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf
5. Situation now
The digital preservation community has failed in persuading the EC
that there is need for more funding for DP research
◦We do not have a consistent story about:
◦ Costs
◦ Rights
◦ Methods etc
◦ “Emulate or Migrate” inadequate!
◦ Who is doing it right
Luxembourg unit which previously funded DP research – name
changed to “Creativity” - now shows no funding for digital
preservation research
EC expects results from the previous 100 M € research by deploying
solutions
6. Digital Preservation –
some quotes:
Head of unit funding the Digital Preservation
projects asked repeatedly:
◦“Who pays and why?”
NSF colleague:
◦“Digital preservation is like VAT – people don’t
like it”
8. “The Digital Agenda for Europe outlines policies and actions
to maximise the benefit of the digital revolution for all.
Supporting research and innovation is a key priority of the
Agenda, essential if we want to establish a flourishing digital
economy.”
Neelie Kroes,
Vice-President of the EC, responsible for the Digital Agenda
Data is the new gold.
“We have a huge goldmine… Let’s start mining it.”
Neelie Kroes
That is the magic to find value amid the mass of data. The right infrastructure, the
right networks, the right computing capacity and, last but not least, the right
analysis methods and algorithms help us break through the mountains of rock to
find the gold within.
9. ……but
Gold is precious because
◦it is rare
◦it does not combine with other elements
◦it does not perish
……..but……….
Data is valuable because
◦there is so much of it
◦it is more valuable when it is combined together
◦BUT it is far from imperishable
Role for
Linked Data
13. Difficulties in digital
preservation
Many different terminologies
Many different views of preservation
Many different kinds of digital objects
◦ Documents
◦ Data
◦ …… and new types of objects
Tools and Services
◦ Which ones work for which digital objects?
◦ Which tools/techniques fit together?
◦ How to integrate new tools
Consistent training needed
Risks vs Cost
Who can you trust?
}Need a
consistent,
coherent
approach to
digital
preservation
- APARSEN.
Need an Audit and Certification
system – ISO 16363
OAIS – ISO 14721
14. Preservation techniques
For each technique
look for evidence – what
evidence?
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦active vs passive
must look at all types of threats
15. Basic preservation
activities
Libraries say:
“Emulate or migrate”
◦ Works well with data only in special cases
◦ Can repeat what was done before instead of new things
◦ Does not help with building cross-disciplinary communities
• Can repeat what has been
done before
BUT
• Cannot use new applications
• Convert to format which
new software can use
BUT
• What if there are many
software systems?
19. OAIS Information model:
Representation Information
The Information Model is
keyRecursion ends at
KNOWLEDGEBASE of
the DESIGNATED
COMMUNITY
(this knowledge will
change over time
and region)
Does not demand that
ALL Representation
Information be
collected at once.
A process which can
be tested
22. Migration
OAIS defines various types of Migration:
◦Do not change the bits
◦Refresh
◦Replicate
◦Change the packaging but not the content
◦Repackage
◦Change the content
◦Transform (usually non-reversible)
◦Need to consider “Transformational Information Properties” – important for
AUTHENTICITY
◦Related to “Significant properties”
◦Add appropriate Representation Information for the new format
22
23. AND – be prepared to
Hand-over
Preservation requires funding
Funding for a dataset (or a repository) may stop
Need to be ready to hand over everything needed
for preservation
◦OAIS (ISO 14721) defines “Archival Information Package
(AIP).
◦Issues:
◦ Storage naming conventions
◦ Representation Information
◦ Provenance
◦ ….
24. Preserving digitally
encoded information
Ensure that digitally encoded information
are understandable and usable over the long
term
Long term could start at just a few years
Chain of preservation
Need to do something because things
become “unfamiliar” over time
But the same techniques enable use of data
which is “unfamiliar” right now
25. When things changes
We need to:
◦Know something has changed
◦Identify the implications of that change
◦Decide on the best course of action for preservation
◦What RepInfo we need to fill the gaps
◦ Created by someone else or creating a new one
◦If transformed: how to maintain data authenticity
◦Alternatively: hand it over to another repository
◦Make sure data continues to be usable
Orchestration
Service
Gap Identification
Service
Preservation
Strategy Tk
RepInfo Registry
Service
Authenticity
Toolkit
Packaging Tk
Data
Virtualisati
on Toolkit
Process
Virtualisati
on Toolkit
RepInfo
Toolkit
28. Preservation objectives
The same digital object may be
preserved with different aims in mind
by different repositories:
For a digital document
Re-print the pages?
To understand the numbers printed in the page to
do further research
For a piece of performance art
Replay a recording of a particular performance?
Re-perform the work?
For a scientific data file
Understand the numbers?
Understand the numbers in the context of a
particular theory?
29. Preservation, Value and
Re-use
(re-)usability the essential test for success of preservation
◦ Usability usually essential for justifying cost of preservation
Impossible to insist on common formats, semantics or software
◦ How to avoid N2
problem?
Impossible to know what formats, semantics or software will be used in future
Needs appropriate Representation Information
◦ for preservation (use in the future when things have become unfamiliar)
◦ for use now (use of unfamiliar data i.e. most of it!)
◦ automated (re-)use as far as possible
APARSEN is bringing together a coherent, consistent, evidence-based approach to
digital preservation involving tools, services, consultancy and training.
30. Classification of objects
must at least make sure we
consider different types of data
◦rendered vs non-rendered
◦composite vs simple
◦dynamic vs static
◦Active vs passive
RDF Triple: dynamic/complex/non-rendered/passive
31. Key questions about the
what is to be preserved
What is the object to be preserved?
The specific piece of RDF?
The specific RDF plus data pointed to
The underlying database (if any)?
The whole linked “world”?
What are the preservation objectives?
The RDF and whole inference system?
Just the RDF?
Just the underlying database (if any)?
32. Key questions about
RDF
What Representation information is needed for the LD?
Schema?
Additional semantics?
Evolution of links e.g. replace this host by a new one)?
Snapshots?
What Transformation?
One version of RDF to another?
Move to replacement for RDF?
Change of underlying database?
Authenticity??
Who to hand over to
What to do with the URIs? – maintain or change?
What to do with the underlying database (if any)?
33. Key questions about the
things the RDF points to
Will they be preserved?
How to find the Representation
Information?
Will the Persistent Identifiers change?
34. Joint Key Questions
Who will pay, and why?
For which things?
Are some things more valuable – and therefore
more likely to be preserved?
What happens when some things disappear?
35. Options
Be clear about what is meant
Understand what is possible
Start with what is agreed as valuable
Don’t promise too much
36. Input to standards
See http://www.iso16363.org
Audit and Certification of Trustworthy
repositories
Forum: OAIS Futures
37. Conclusions
A great deal of funding (€100M) has been
invested in digital preservation research by the EU
EC is not putting further funding into digital
preservation research
There are technical challenges
The biggest challenge is to be clear about what
the preservation aims are for Linked Data
Data is migrated – big job but is done sometimes.
Emulation is sometimes used but mainly for repeating processing for some specific reason. More generally users do not want to simply repeat what has been done before.
Just to be clear – I am focussing on the OAIS Information Model
Divide Migration into 3 groups depending what changes:
Refresh – replace media like for like
Replicate – maybe new media
Repackage – e.g. copy from tape to disk
Transform – e.g. change from Word to PDF or
- The “migrate” in “emulate or migrate” is the third one - Transform