| www.eudat.eu | Presentation given by Stéphane Coutin during the PRACE 2017 Spring School joint training event with the EU H2020 VI-SEEM project (https://vi-seem.eu/) organised by CaSToRC at The Cyprus Institute. Science and more specifically projects using HPC is facing a digital data explosion. Instruments and simulations are producing more and more volume; data can be shared, mined, cited, preserved… They are a great asset, but they are facing risks: we can miss storage, we can lose them, they can be misused,… To start this session, we will review why it is important to manage research data and how to do this by maintaining a Data Management Plan. This will be based on the best practices from EUDAT H2020 project and European Commission recommendation. During the second part we will interactively draft a DMP for a given use case.
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT
| www.eudat.eu | Annotate your research data with B2NOTE:
A note in the margins of a book or a scientific paper, a comment on a manuscript: we are all using annotations to add information to existing physical documents. To offer a similar experience with digital content within the EUDAT Collaborative Data Infrastructure (CDI), we developed a service that allows associating additional information to a file, in a computer-readable format, without changing the file or the data record itself. These digital annotations can thus be searched to organize, retrieve and aggregate files, datasets and documents.
Although B2NOTE is a standalone service, it has been designed to be integrated with the existing EUDAT services. In the first pilot version, B2NOTE allows to annotate files located in B2SHARE. The service is called as a “widget” within the B2SHARE User Interface. B2NOTE allows you to easily and intuitively create three types of annotations: a semantic tag coming from identified ontology repositories (only Bioportal at the moment but we are working toward integrating more vocabularies), a free-text keyword that can be used when you do not find a semantic term in particular and a free-text comment.
Research engagement in EUDAT| www.eudat.eu | EUDAT
| www.eudat.eu | EUDAT’s vision is to enable European researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure (CDI) conceived as a network of collaborating, cooperating centres, that combine community-specific data repositories with the permanence and persistence of some of Europe’s largest scientific data centres. EUDAT services are community driven solutions. This presentation describes the different ways EUDAT engages with the research communities
B2SHARE - How to share and store research data using EUDAT’s B2SHARE | www.eu...EUDAT
| www.eudat.eu | B2SHARE is a user-friendly, reliable and trustworthy way for researchers, scientific communities and scientists to store and share small-scale research data from diverse contexts.
| www.eudat.eu | B2FIND - User training Version 07, June 2017: B2FIND is EUDAT’s simple, user friendly metadata catalogue allowing users to discover metadata from a wide range of scientific communities.
B2SHARE: Record lifecycle and HTTP API| www.eudat.eu | EUDAT
| www.eudat.eu | B2SHARE is a scientific data repository providing persistent storage and sharing data facilities. Building on the new Invenio 3.0 digital assets management platform, a new version of B2SHARE has been developed which is focused on an improved user experience. Answering the requests of the current user base, B2SHARE version 2 provides customizable metadata schemas and a simple but effective workflow for depositing user data, exposed in its RESTful HTTP API.
The presentation will introduce the B2SHARE service, its organizing principles and its basic operations. The metadata schemas and the dataset lifecycle, which are essentials in understanding the possibilities of the service, will be the main focus of the talk. The concrete output of the session can be a full paper expanding the presented topics.
Target Audience:Researchers of any scientific domain, which work with publishable data sets.
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT
| www.eudat.eu | 2nd Session: July 14, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
EUDAT Webinar "Organise, retrieve and aggregate data using annotations with B...EUDAT
| www.eudat.eu | Annotate your research data with B2NOTE:
A note in the margins of a book or a scientific paper, a comment on a manuscript: we are all using annotations to add information to existing physical documents. To offer a similar experience with digital content within the EUDAT Collaborative Data Infrastructure (CDI), we developed a service that allows associating additional information to a file, in a computer-readable format, without changing the file or the data record itself. These digital annotations can thus be searched to organize, retrieve and aggregate files, datasets and documents.
Although B2NOTE is a standalone service, it has been designed to be integrated with the existing EUDAT services. In the first pilot version, B2NOTE allows to annotate files located in B2SHARE. The service is called as a “widget” within the B2SHARE User Interface. B2NOTE allows you to easily and intuitively create three types of annotations: a semantic tag coming from identified ontology repositories (only Bioportal at the moment but we are working toward integrating more vocabularies), a free-text keyword that can be used when you do not find a semantic term in particular and a free-text comment.
Research engagement in EUDAT| www.eudat.eu | EUDAT
| www.eudat.eu | EUDAT’s vision is to enable European researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure (CDI) conceived as a network of collaborating, cooperating centres, that combine community-specific data repositories with the permanence and persistence of some of Europe’s largest scientific data centres. EUDAT services are community driven solutions. This presentation describes the different ways EUDAT engages with the research communities
B2SHARE - How to share and store research data using EUDAT’s B2SHARE | www.eu...EUDAT
| www.eudat.eu | B2SHARE is a user-friendly, reliable and trustworthy way for researchers, scientific communities and scientists to store and share small-scale research data from diverse contexts.
| www.eudat.eu | B2FIND - User training Version 07, June 2017: B2FIND is EUDAT’s simple, user friendly metadata catalogue allowing users to discover metadata from a wide range of scientific communities.
B2SHARE: Record lifecycle and HTTP API| www.eudat.eu | EUDAT
| www.eudat.eu | B2SHARE is a scientific data repository providing persistent storage and sharing data facilities. Building on the new Invenio 3.0 digital assets management platform, a new version of B2SHARE has been developed which is focused on an improved user experience. Answering the requests of the current user base, B2SHARE version 2 provides customizable metadata schemas and a simple but effective workflow for depositing user data, exposed in its RESTful HTTP API.
The presentation will introduce the B2SHARE service, its organizing principles and its basic operations. The metadata schemas and the dataset lifecycle, which are essentials in understanding the possibilities of the service, will be the main focus of the talk. The concrete output of the session can be a full paper expanding the presented topics.
Target Audience:Researchers of any scientific domain, which work with publishable data sets.
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT
| www.eudat.eu | 2nd Session: July 14, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
| www.eudat.eu | B2FIND Integration Version 4 February 2017: The aim of this presentation is to illustrate how metadata can be published in the B2FIND catalogue and how EUDAT’s B2FIND metadata catalogue can be integrated.
EUDAT Research Data Management | www.eudat.eu | EUDAT
| www.eudat.eu | The presentation gives an introduction to Research Data Management, explaining why it is important to manage and share data.
November 2016
EDF2014: Talk of Stefan Decker, Director, Insight Galway, Ireland & Anthony M...European Data Forum
Invited Talk of Stefan Decker, Director, Insight Galway, Ireland & Anthony McCauley, Head of Research, Fujitsu Ireland at the European Data Forum 2014, 19 March 2014 in Athens, Greece: KI2NA - Using Linked Data for the Intelligent society
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...Open Science Fair
Workshop title: Towards a Policy Framework for the European Open Science Cloud
Workshop abstract:
The workshop provides a hands on approach in relation both to the understanding of the EU open science policies and their application by related stakeholders. It will seek to explore, propose and test different aspects of policy documents created by and for different types of stakeholders (e.g. RPOs, funders, policy makers etc) in the context of EOSC. Drawing on the work by the EOSC policy work, the workshop invites participants to bring their own policies or work on model policies to develop a simple but comprehensive policy document tailored to their needs and conforming to the EU policy and legal framework.
It is useful to the broader Open Science community as it brings together services, stakeholders and policies and allows for a better understanding of the interaction between different constituencies.
DAY 2 - PARALLEL SESSION 3
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
| www.eudat.eu | 1st Session: July 7, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
B2STAGE- how to shift large amounts of data| www.eudat.eu | EUDAT
| www.eudat.eu | B2STAGE is a reliable, efficient, light-weight and easy-to-use service to transfer research data sets between EUDAT storage resources and high-performance computing (HPC) workspaces.
This webinar will provide an overview of the OpenAIRE Guidelines for data source managers who operate literature repositories, data archives or current research information systems.
The general principle of these guidelines is to improve interoperability of bibliographic information exchange between repositories, e-journals, CRIS and research infrastructures. In particular they are a means to help content providers to comply with Open Access policies and enable reporting of research output from public funding, e.g. the European Commission Open Access mandate in Horizon2020. An important aspect of the continuous development of these guidelines includes the use of established authority files and controlled vocabularies.
Legal Issues in Research Data Collection and Sharing: An Introduction by EUDA...EUDAT
| www.eudat.eu | v1.0, June 2014 - This course provides guidelines on the collection, usage and sharing of data in research by providing the basic information related to ethical and legal obligations. The course is made up of three modules (further modules will be added in the next months): 1. Intellectual Property Rights. 2. Personal Data. 3. Service Provider Liability & Terms of Service.
Who is it for?: Researchers, Data Managers, General public.
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu | EUDAT
| www.eudat.eu | Welcome Overview of the EUDAT service suite and the FAIR principles.
Sarah Jones, Marjan Grootveld, Yann Le Franc - IDCC Conference, February 20, 2017
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
EUDAT and PRACE joined forces to help research communities gain access to high quality managed e-Infrastructures whose resources can be connected together to enable cross-utilization use cases and make them accessible without any technical barrier. The capability to couple data and compute resources together is considered one of the key factors to accelerate scientific innovation and advance research frontiers. The goal of this session was to present the EUDAT services, the results of the collaboration activity achieved so far and delivers a hands-on on how to write a Data Management Plan or DMP. The DMP is a useful instrument for researchers to reflect on and communicate about the way they will deal with their data. It prompts them to think about how they will generate, analyse and share data during their research project and afterwards.
Visit: https://www.eudat.eu/eudat-summer-school
A presentation given on the Horizon 2020 open data pilot as part of a series of OpenAIRE webinars for Open Access week 2014 - http://www.fosteropenscience.eu/event/openaire-webinars-during-oa-week-2014
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...OpenAIRE
Sarah Jones (HATII, Digital Curation Center) will provide more information on the Open Research Data Pilot in H2020: who should participate and how to comply (in collaboration with FOSTER)
Date: Tuesday, October 21 2014
| www.eudat.eu | B2FIND Integration Version 4 February 2017: The aim of this presentation is to illustrate how metadata can be published in the B2FIND catalogue and how EUDAT’s B2FIND metadata catalogue can be integrated.
EUDAT Research Data Management | www.eudat.eu | EUDAT
| www.eudat.eu | The presentation gives an introduction to Research Data Management, explaining why it is important to manage and share data.
November 2016
EDF2014: Talk of Stefan Decker, Director, Insight Galway, Ireland & Anthony M...European Data Forum
Invited Talk of Stefan Decker, Director, Insight Galway, Ireland & Anthony McCauley, Head of Research, Fujitsu Ireland at the European Data Forum 2014, 19 March 2014 in Athens, Greece: KI2NA - Using Linked Data for the Intelligent society
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...Open Science Fair
Workshop title: Towards a Policy Framework for the European Open Science Cloud
Workshop abstract:
The workshop provides a hands on approach in relation both to the understanding of the EU open science policies and their application by related stakeholders. It will seek to explore, propose and test different aspects of policy documents created by and for different types of stakeholders (e.g. RPOs, funders, policy makers etc) in the context of EOSC. Drawing on the work by the EOSC policy work, the workshop invites participants to bring their own policies or work on model policies to develop a simple but comprehensive policy document tailored to their needs and conforming to the EU policy and legal framework.
It is useful to the broader Open Science community as it brings together services, stakeholders and policies and allows for a better understanding of the interaction between different constituencies.
DAY 2 - PARALLEL SESSION 3
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
| www.eudat.eu | 1st Session: July 7, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
B2STAGE- how to shift large amounts of data| www.eudat.eu | EUDAT
| www.eudat.eu | B2STAGE is a reliable, efficient, light-weight and easy-to-use service to transfer research data sets between EUDAT storage resources and high-performance computing (HPC) workspaces.
This webinar will provide an overview of the OpenAIRE Guidelines for data source managers who operate literature repositories, data archives or current research information systems.
The general principle of these guidelines is to improve interoperability of bibliographic information exchange between repositories, e-journals, CRIS and research infrastructures. In particular they are a means to help content providers to comply with Open Access policies and enable reporting of research output from public funding, e.g. the European Commission Open Access mandate in Horizon2020. An important aspect of the continuous development of these guidelines includes the use of established authority files and controlled vocabularies.
Legal Issues in Research Data Collection and Sharing: An Introduction by EUDA...EUDAT
| www.eudat.eu | v1.0, June 2014 - This course provides guidelines on the collection, usage and sharing of data in research by providing the basic information related to ethical and legal obligations. The course is made up of three modules (further modules will be added in the next months): 1. Intellectual Property Rights. 2. Personal Data. 3. Service Provider Liability & Terms of Service.
Who is it for?: Researchers, Data Managers, General public.
How EUDAT services support FAIR data - IDCC 2017| www.eudat.eu | EUDAT
| www.eudat.eu | Welcome Overview of the EUDAT service suite and the FAIR principles.
Sarah Jones, Marjan Grootveld, Yann Le Franc - IDCC Conference, February 20, 2017
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
EUDAT and PRACE joined forces to help research communities gain access to high quality managed e-Infrastructures whose resources can be connected together to enable cross-utilization use cases and make them accessible without any technical barrier. The capability to couple data and compute resources together is considered one of the key factors to accelerate scientific innovation and advance research frontiers. The goal of this session was to present the EUDAT services, the results of the collaboration activity achieved so far and delivers a hands-on on how to write a Data Management Plan or DMP. The DMP is a useful instrument for researchers to reflect on and communicate about the way they will deal with their data. It prompts them to think about how they will generate, analyse and share data during their research project and afterwards.
Visit: https://www.eudat.eu/eudat-summer-school
A presentation given on the Horizon 2020 open data pilot as part of a series of OpenAIRE webinars for Open Access week 2014 - http://www.fosteropenscience.eu/event/openaire-webinars-during-oa-week-2014
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...OpenAIRE
Sarah Jones (HATII, Digital Curation Center) will provide more information on the Open Research Data Pilot in H2020: who should participate and how to comply (in collaboration with FOSTER)
Date: Tuesday, October 21 2014
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
Keynote presentation given at "The Emerging Technology Forum – Data Creates Universe - Scientific Data Innovation Conference" of the "Pujiang Innovation Forum 2021" event.
LIBER Webinar: Turning FAIR Data Into RealityLIBER Europe
These slides relate to a LIBER Webinar given on 23 April 2018. Turning FAIR Data Into Reality — Progress and Plans from the European Commission FAIR Data Expert Group.
In this webinar, Simon Hodson, Executive Director of CODATA and Chair of the FAIR Data Expert Group, and Sarah Jones, Associate Director at the Digital Curation Centre and Rapporteur, reported on the Group’s progress.
FAIR data: what it means, how we achieve it, and the role of RDASarah Jones
Presentation on FAIR data, the FAIR Data Action Plan developed by the European Commission Expert Group and the role of the Research Data Alliance on implementing FAIR. The presentation was given at the RDAFinland workshop held on 6th June - https://www.csc.fi/web/training/-/rda_and_fair_supporting_finnish_researchers
Presented by Melanie Bacou, IFPRI and Todd Slind, Spatial Development International at the Africa RISING–CSISA Joint Monitoring and Evaluation Meeting, Addis Ababa, Ethiopia, 11-13 November 2013
With a network of more than 20 European research
organisations, data and computing centres in 14 countries,
the EUDAT Collaborative Data Infrastructure (CDI) is one of
the largest infrastructures of integrated data services and
resources supporting research in Europe.
Are you a researcher, citizen scientist, institution or community looking for data storage and value-added services? Do you want access to tools to make your research data more FAIR (findable, accessible, interoperable, and reusable)? Interested in seeing how the future European Open Science Cloud could support research data and practically foster cross-border, cross-disciplinary collaboration? Then this webinar is for you!
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Data management plans – EUDAT Best practices and case study | www.eudat.eu
1. EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu
Data Management Plans –
EUDAT best practices and case study
Stéphane COUTIN (CINES)
26 April 2017
This work is licensed under the Creative
Commons CC-BY 4.0 licence
2. Objectives
High level presentation of research
data management and H2020 context
Present a simple approach and draft a
DMP for a given case.
Overview of EUDAT services
4. Based in Montpellier, France – approx. 60 people
(engineers, techs, admin)
Created in 1999, aka CNUSC (Centre National
Universitaire Sud de Calcul) – created in 1980
Administrated and funded by Ministry of Higher
Education & Research (MESR)
4
Provides the French public research
community with computing
resources, services and
expertise
3 main mandates / activities:
High performance computing
Digital preservation
Hosting
Centre Informatique National de l’Enseignement
Supérieur
5. Stéphane COUTIN coutin@cines.fr
Research engineer at CINES since 2013
Initialy in digital preservation dept
Now working in HPC dept
Involved in EU projects
Leading PRACE collaboration task with other
eInfra
Working on EUDAT for collaboration with PRACE
Background in Information Systems projects and
programmes management
6. EUDAT – www.eudat.eu
Image CC-BY-NC ‘Data centre’ by Bob Mical
www.flickr.com/photos/small_realm/15995555571
7. A pan-European e-Infrastructure solution for pan-
European RI data Challenges
All RIs are facing data challenges
Where to store the growing amount of data?
How to find it?
How to make the most of it?
Solutions are needed at pan-European level
7
We need to promote synergies
Some services are common to many
communities
Costs and investments can be
optimised
Better integration of e-infras and
research infrastructures can be
achieved
9. A truly pan-European Infrastructure
EUDAT offers common data services, supporting
multiple research communities as well as individuals,
through a geographically distributed, resilient network
of 35 European organisations
The EUDAT vision is to enable
European researchers and
practitioners from any research
discipline to preserve, find,
access, and process data in a
trusted environment, as part of
a Collaborative Data
Infrastructure
10. PRACE – EUDAT collaboration
Joint Open Calls for proposals
EUDAT offering data services and resources
through regular PRACE calls
Review process is transparent to users
Joint training activities
Continuous technical discussion and
developments of new components
Definition of the EUDAT Workspace area
Synchronization of authentication credentials
for single sign-on
10
11. Quick question:
Think to your ongoing or next to start HPC project
What are your data related requirement?
What is the budget for this?
12. THE CHANGING DATA LANDSCAPE
Image CC-BY-SA ‘data.path Ryoji.Ikeda - 3’ by r2hox www.flickr.com/photos/rh2ox/9990016123
13. Data explosion
More and more data is
being created
Issue is not creating
data, but being able to
navigate and use it
Data management is
critical to make sure
data are well-organised,
understandable and
reusable
14. Digital data are fragile and susceptible to loss for a wide variety of reasons
Natural disaster
Facilities infrastructure failure
Storage failure
Server hardware/software failure
Application software failure
Format obsolescence
Human error
Malicious attack
Loss of staffing competencies
Loss of institutional commitment
Loss of financial stability
Changes in user expectations
Data loss
Image CC-BY ‘Hard Drive 016’ by Jon Ross www.flickr.com/photos/jon_a_ross/1482849745
15. Link rot – more 404 errors
generated over time
Reference rot* – link rot
plus content drift i.e.
webpages evolving and
no longer reflecting
original content cited
* Term coined by Hiberlink http://hiberlink.org
Data persistency issues
Jonathan D. Wren Bioinformatics 2008;24:1381-1385
17. Why manage research data?
To make your research easier!
To stop yourself drowning in irrelevant stuff
In case you need the data later
To avoid accusations of fraud or bad science
To share your data for others to use and learn from
To get credit for producing it
Because funders or your organisation require it
Well-managed data opens up opportunities
for re-use, integration and new science
18. H2020 open research data pilot
• Already expanded from a select pilot to all work
areas
• All need to consider which data can be made
open
• Mantra = “As open as possible as closed as
necessary”
• Underlying driver is good (FAIR) data
management
Image CC-BY-SA by SangyaPundir
19. Key requirements of the open data pilot
Beneficiaries participating in the Pilot will:
Deposit data in a research data repository of
their choice
Take measures to make it possible for others to
access, mine, exploit, reproduce and
disseminate the data free of charge
Provide information about tools and instruments
necessary for validating the results (where
possible, provide the tools and instruments
themselves)
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi
/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
22. Simple diagram focusing on data dynamics
You can use other diagram type
DFD : Data Flow Diagram
23. You and your team are submitting a proposal for a project in the domain of smart cities.
The City has implemented a large set of sensors measuring traffic. The data are collected
in the City datacenter.
You want to develop an application being able to forecast the traffic and also how it will
be impacted by events like planned roadworks. This application would run on a PRACE
site, not located in the City. On the PRACE site your storage space is limited to 10 TB.
The application uses the following inputs:
Sensors historical data over the last 12 months : sensors produce 1TB of data a day.
You implement a preprocessing module translating those data into a reduced data set
(10 MB per day). It is based on a format you have defined to describe the traffic.
The results provided by the simulation. This enables comparison between forecasted
and actual traffic in order to ‘train’ the application.
Weather data (historical and forecast) provided by the national meteo agency. They
use the SYNOP format. The volume is negligible.
Results will be accessible by the city council employees.
Create the project data flow diagram and fill the data summary chapter using a
table.
What would you appreciate to use efficiently the weather data?
Exercise – Phase 1
25. Proposed data flow diagram
Sensors collection area
PRACE HPC Site
Simulations
PRACE
Storage
Output files
extractor
Input files
Raw sensor
data
Data
Preprocessing
Reduced
sensor data
Weather data
City council
employees
Data transfer
26. Data summary table
Dataset Description Origin? Existing? Format Size Who could use it?
Raw sensor
data
Available, collected
from sensors
Various 1TB per
day
Reduced
sensor data
Actual
traffic, …
Extracted from raw
sensor data
Binary
(specific)
10 MB a
day
Our simulation
Weather
data
Actual and
forecast
Existing. Meteo open
data platform
SYNOP 1MB a
week
Our simulation
Citizens, scientists, ..
Simulation
results
Forecasted
traffic
Results of our
simulation
Binary
(specific)
10 MB a
day
City council
employees, our
application
27. CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
Research data lifecycle
CREATING DATA: designing research,
DMPs, planning consent, locate existing
data, data collection and management,
capturing and creating metadata
RE-USING DATA: follow-
up research, new
research, undertake
research reviews,
scrutinising findings,
teaching & learning
ACCESS TO DATA:
distributing data,
sharing data,
controlling access,
establishing copyright,
promoting data PRESERVING DATA: data storage, back-
up & archiving, migrating to best format
& medium, creating metadata and
documentation
ANALYSING DATA:
interpreting, & deriving
data, producing outputs,
authoring publications,
preparing for sharing
PROCESSING DATA:
entering, transcribing,
checking, validating and
cleaning data, anonymising
data, describing data,
manage and store data
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle
28. Findable
– Assign persistent IDs, provide rich metadata, register in a searchable
resource,...
Accessible
– Retrievable by their ID using a standard protocol, metadata remain accessible
even if data aren’t...
Interoperable
– Use formal, broadly applicable languages, use standard vocabularies,
qualified references...
Reusable
– Rich, accurate metadata, clear licences, provenance, use of community
standards...
FAIR for machines as well as people
www.force11.org/group/fairgroup/fairprinciples
Making data FAIR
31. Metadata and documentation is needed to locate
and understand research data
Think about what others would need in order to find,
evaluate, understand, and reuse your data.
Get others to check the metadata to improve quality
Use standards to enable interoperability
Metadata and documentation
32. Use of standards
Controlled vocabularies for unambiguous keywords
Simple, complete andconsistent information
Appropriate description
Explanation of limitations to support reuse
Avoid special characters e.g. !@<~ etc...
Provide persistent identifiers such as DOIs
What makes metadata good?
33. The good and the bad
Metres / seconds
2015-09-10T15:00:01+01:00
Longitudinal wind speed
PDF 1.7
2008 US Population statistics
Barcelona, Venezuela
Furlongs and fortnight
10th Sept. 2015 15:00:01
U
PDF
Population statistics
Barcelona
More precise and
standardised Ambiguous
34. Digital preservation context
34
Main risks deal with:
• Comprehension
• Integrity
• Exploitation
• Valorization
Quality assurance
procedures to be setup for
• Metadata
• File formats
• Representation information
• Storage
• Access
• Technology watching
35. Based in Montpellier, France – approx. 60 people
(engineers, techs, admin)
Created in 1999, aka CNUSC (Centre National
Universitaire Sud de Calcul) – created in 1980
Administrated and funded by Ministry of Higher
Education & Research (MESR)
4
Provides the French public research
community with computing
resources, services and
expertise
3 main mandates / activities:
High performance computing
Digital preservation
Hosting
Centre Informatique National de l’Enseignement
Supérieur
36. Based in Montpellier, France – approx. 60 people
(engineers, techs, admin)
Created in 1999, aka CNUSC (Centre National
Universitaire Sud de Calcul) – created in 1980
Administrated and funded by Ministry of Higher
Education & Research (MESR)
4
Provides the French public research
community with computing
resources, services and
expertise
3 main mandates / activities:
High performance computing
Digital preservation
Hosting
Centre Informatique National de l’Enseignement
Supérieur
37. Based in Montpellier, France – approx. 60 people
(engineers, techs, admin)
Created in 1999, aka CNUSC (Centre National
Universitaire Sud de Calcul) – created in 1980
Administrated and funded by Ministry of Higher
Education & Research (MESR)
4
Provides the French public research
community with computing
resources, services and
expertise
3 main mandates / activities:
High performance computing
Digital preservation
Hosting
Centre Informatique National de l’Enseignement
Supérieur
38. Based in Montpellier, France – approx. 60 people
(engineers, techs, admin)
Created in 1999, aka CNUSC (Centre National
Universitaire Sud de Calcul) – created in 1980
Administrated and funded by Ministry of Higher
Education & Research (MESR)
4
Provides the French public research
community with computing
resources, services and
expertise
3 main mandates / activities:
High performance computing
Digital preservation
Hosting
Centre Informatique National de l’Enseignement
Supérieur
39. Based in Montpellier, France – approx. 60 people
(engineers, techs, admin)
Created in 1999, aka CNUSC (Centre National
Universitaire Sud de Calcul) – created in 1980
Administrated and funded by Ministry of Higher
Education & Research (MESR)
4
Provides the French public research
community with computing
resources, services and
expertise
3 main mandates / activities:
High performance computing
Digital preservation
Hosting
Centre Informatique National de l’Enseignement
Supérieur
40. Based in Montpellier, France – approx. 60 people
(engineers, techs, admin)
Created in 1999, aka CNUSC (Centre National
Universitaire Sud de Calcul) – created in 1980
Administrated and funded by Ministry of Higher
Education & Research (MESR)
4
Provides the French public research
community with computing
resources, services and
expertise
3 main mandates / activities:
High performance computing
Digital preservation
Hosting
Centre Informatique National de l’Enseignement
Supérieur
43. CDI Data Domain
EUDAT Data Domain modeled on the ANDS1 Data Curation Continiuum
1. Australian National Data Service organization – www.ands.org.au
44. Store and exchange data with
colleagues and team members,
including research data not
finalized for publishing
share data with fine-grained
access controls
synchronize multiple versions of
data across different devices
An ideal solution for researchers and scientists to:
Features:
20 GB storage per user
Living objects, so no PIDs
Versioning and offline use
Desktop synchronisation
Sync and Share Research Data
B2DROP – personal cloud
b2drop.eudat.eu
45. store data safely at a trusted
and certified data centre
preserve data to guarantee
long-term persistence
control access and share
data with colleagues and the
world
A winning solution for researchers, scientists and
communities to:
Features:
Metadata management
Permanent PIDs
Open Access support
Store and Publish Research Data
B2SHARE - repository
b2share.eudat.eu
46. replicate research data into secure
data stores
archive and preserve research data
in the long-term
bring data close to powerful
compute resources
co-locate data with different
communities
benefit from economies of scale
The ideal solution for communities with no facility for
archival to:
Features:
Large-scale storage
Robust and highly available
Permanent PIDs
Replicate Research Data Safely
B2SAFE - preservation
eudat.eu/b2safe
47. move large amounts of data
between data stores and high-
performance compute resources
re-ingest computational results
back into EUDAT
deposit large data sets onto
EUDAT resources for long-term
preservation
Facilitating communities to:
Features:
High-speed transfer
Reliable and light-weight
Manages permanent PIDs
Get Data to Computation
B2STAGE - transfer
eudat.eu/b2stage
48. seek data objects and collections
using powerful metadata searches
catalogue community data by
means of selected metadata
browse through multi-disciplinary
data collections filtered by
content, provenance and
temporal keywords
A metadata catalogue service to:
Features:
Simple to use
Standards-based
Comprehensive catalogue
Find Research Data
B2FIND - catalogue
b2find.eudat.eu
49. Update the dataflow diagram with EUDAT services
you could use for preservation and metadata
publication.
Exercise phase 3
50. DFD with other EUDAT services
Simulations
PRACE
Storage
Output files
extractor
Input files
EUDAT B2SHARE
B2SHARE
storage
Web front end
Or API
Results
Traffic data
Data extractor
(uses API)
Publication
(uses API)
Actual traffic
Forecast traffic
Citizens,
researchers,
companies, ...
Search and retrieve data
EUDAT Site
EUDAT B2SAFE sorage
EUDAT
B2SAFE
Data and metadata
EUDAT
B2FIND
Metadata
Metadata search
Replication
51. www.eudat.eu
Thanks – any questions
Acknowledgements:
Thanks to Mark van de Sanden, Marjan Grootveld , Sarah Jones
and Giuseppe Fiameni for some of the slides