The Research Data Alliance in Europe, an
update…
EXTREME SCALE SCIENTIFIC COMPUTING WORKSHOP
Moscow – 30 June & 1 July 2014
Fabrizio Gagliardi
BSC, Spain - ACM Europe Chair
2
 Fabrizio Gagliardi reborn in BSC, Spain
 After 30 years at CERN in Geneva
 Many EU projects
 And last 8 years in Microsoft and Microsoft
Research
 Long history of projects in Russia on Grid
computing, Big data, HPC and computing vision
@ MSU and MSR HPC summer schools 2009-
2012
Introduction
3
Big data, hype and HPC
“Big data” means different things to different people
(consider Satoshi’s previous talk)
• corporate data are not so big and demanding when
compared to scientific data
• social data are large but access is easy and trivially
parallel
• scientific data in new research domains like genetics is a
bigger challenge
• not true for all scientific data, CERN will produce 100
PB/year starting next year but with easy access and simple
processing models, still a very expensive game…
4
Horizon2020: Research and Innovation
Horizon 2020 is the biggest EU Research and Innovation
programme ever with nearly €80 billion of funding available
over 7 years (2014 to 2020).
In addition to the private investment that this money will
attract. It promises more breakthroughs, discoveries and world-
firsts by taking great ideas from the lab to the market.
5Research and Innovation
 Research AND Innovation, not Research OR
Innovation
 Research activities with innovation in mind
 Innovation should have job creation in mind
 But how to take great ideas from the lab to the
market?
 What can a research funder do?
 Which instruments do we have?
6job creation is important
Following slides adapted from Joe McKendrick/Forbes, September 2012
http://www.smartplanet.com/blog/bulletin/7-new-types-of-jobs-created-by-big-
data/682
7 new types of jobs created by Big Data
In today’s unforgiving global economy, those organizations
that compete on analytics stand the best chance of outsmarting
the competition. The only catch is, they need skilled
professionals who know how to manage, mine and draw
actionable insights from all the “Big Data” now streaming
across enterprises.
7job creation is important
1. Data scientists: this emerging role is taking the lead in
processing raw data and determining what types of analysis would
deliver the best results
2. Data architects: organizations managing Big Data need
professionals who will be able to build a data model, and plan out a
roadmap of how and when various data sources and analytical tools
will come online, and how they will all fit together
3. Data visualizers: organizations need professionals who can
“harness the data and put it in context, in layman’s language,
exploring what the data means and how it will impact the company”
8job creation is important
4. Data change agents: driving “changes in internal operations
and processes based on data analytics.” They need to be good
communicators, they know how to apply statistics to improve quality
on a continuous basis
5. Data engineer/operators: people that make the Big Data
infrastructure hum on a day-to-day basis. “They develop the
architecture that helps analyse and supply data in the way the
business needs, and make sure systems are performing smoothly”
6. Data stewards: ensure that data sources are properly
accounted for
7. Data virtualization/cloud specialists: ability to build and
maintain a virtualized data service layer; organizations need
professionals that can also build and support these virtualized layers or
clouds
9
network infrastructure, GÉANT
HPC/distributed computing/software infrastructure
scientific data infrastructure
e-infrastructure building bridges
10issues to be addressed (e-infrastructure)
The EC in coordination with EU Member States is looking
after research data as an infrastructure
As a valuable and a strategic resource, research data
opens at least three key issues to be addressed(*)
:
 How data can be networked
 How to envision and set up data governance on a
global scale
 How the EU can play a leading role in helping start and
steer this global trend
(*) Fred Friend, Jean-Claude Guédon Herbert van Sompel
“Beyond Sharing and Re-using: Toward Global Data
Networking”
11Policy context
A Reinforced European Research Area Partnership for
Excellence and Growth, COM(2012) 392 – July 2012
Towards better access to scientific information: boosting the
benefits of public investments in research, COM(2012) 401 final -
July2012
Commission, Recommendation on access and preservation of
scientific information, C(2012) 4890 final – July 2012
Horizon 2020
 - Open Access to Scientific Publications
 - Pilot on research data
Data Management Plan
Open Science
12
RESEARCH INFRASTRUCTURE (E-INFRASTRUCTURE HIGHLIHGTED)
Work Programme 2014-2015
CALL 1
DEVELOPING NEW
WORLD CLASS
INFRASTRUCTURES
CALL 2
INTEGRATING AND OPENING
RESEARCH
INFRASTRUCTURES
OF PAN-EUROPEAN
INTEREST
CALL 3
E-INFRASTRUCTURES
CALL 4
SUPPORT TO INNOVATION,
HUMAN RESOURCES,
POLICY AND INTERNATIONAL
COOPERATION
FOR RESEARCH
INFRASTRUCTURES
DESIG
N
STUDIE
S
SUPPORT TO
PREPARATORY
PHASE OF ESFRI
PROJECTS
SUPPORT TO THE
INDIVIDUAL
IMPLEMENTATION
AND OPERATION
OF ESFRI PROJECTS
SUPPORT TO THE
IMPLEMENTATION OF CROSS-
CUTTING INFRASTRUCTURE
SERVICES AND SOLUTIONS FOR
CLUSTER OF ESFRI AND OTHER
RILEVANT RESEARCH
INFRASTRUCTURE INITIATIVES IN
A GIVEN THEMATIC AREA
INTEGRATING AND OPENING
EXISTING NATIONAL AND
REGIONAL
RESEARCH INFRASTRUCTURES OF
PAN-EUTROPEAN INTEREST
MANAGING,
PRESERVING AND
COMPUTING WITH
BIG RESERACH DATA
E-
INFRASTRUCTURE
S FOR OPEN
ACCESS
TOWARDS GLOBAL
DATA
E-INFRASTRUCTURES:
RESEARCH DATA
ALLIANCE
Pan-European
High Performance Computing
infrastructure and services
Centres
of Excellence
for Computing
applications
Network of
HPC Competence
Centres for SMEs
PROVISION OF
CORE SERVICES
ACROSS
E-
INFRASTRUCTURE
S
RESEARCH
AND
EDUCATION
NETWORKING
– GEANT
E-INFRASTRUCTURES
FOR VIRTUAL
RESEARCH
ENVIRONMENTS (VRE)
INNOVATI
ON
SUPPORT
MEASURE
S
INNOVATIVE
PROCUREMENT
PILOT ACTION IN THE FIELD
OF
SCIENTIFIC
INSTRUMENTATION
STRENGTHENING
THE HUMAN
CAPITAL OF
RESEARCH
INFRASTRUCTURES
NEW PROFESSIONS
AND SKILLS
FOR E-
INFRASTRUCTURES
POLICY
MEASURES
FOR RESEARCH
INFRASTRUCTUR
ES
INTERNATIONAL
COOPERATION
FOR RESEARCH
INFRASTRUCTURES
E-INFRASTRUCTURE
POLICY DEVELOPMENT
AND INTERNATIONAL
COOPERATION
NETWORK OF
NATIONAL
CONTACT POINTS
CALLS IN 2014
DEADLINES SEPT 2014 AND JAN
2015
INITIATIVES STARTING IN 2015
UNTIL 2018
Fran Berman
Research Data Driving Solutions to Complex
Scientific and Societal Challenges
Who is most
at risk to
contract
asthma?
How can we increase
wheat yields?
How accurate is the
Standard Model of
Physics?
Image: Lucas
Taylor
How can we
best address
energy needs
and
sustain the
environment?
Image: Ceinturion, Wikipedia
Fran Berman
Data-Sharing Driving Innovation Across Sectors
and Communities
Fran Berman
World-wide Efforts Focusing on Infrastructure to Support
Research Data Sharing, Access, Use
Science, Humanities, Arts
Communities
E-Infrastructure professionals, data
analysts, data center staff, …
Data
Scientists
Libraries, Archives,
Repositories, Museums
Fran Berman
Institutional Data
Sharing Practice
Data Access and Distribution
Policy
Data
Discovery Tools
Common
Metadata Standards
Digital Object
Identifiers
Data Citation
Standards
Data
Analytics Algorithms
Data
Preservation Practice
Data Scientists and
Expert Support
Sustainable
Economic Models
Curation Practice and
Policy
Auditing, Certification and
Reporting Practice
Fran Berman
Many Infrastructure Building Blocks Needed to
Accelerate Progress
Data Use
and
Re-use
Data Discovery
and Data
Sharing
Research
Dissemination and
Reproducibility
Data Access (now)
and Preservation
(later)
Fran Berman
Research Data Alliance Created to Accelerate
Development of Research Data Sharing
Infrastructure Worldwide
 RDA community efforts focus
on building social,
organizational and technical
infrastructure to
 reduce barriers to data
sharing and exchange
 accelerate the development
of coordinated global data
infrastructure
RDA and RDA/US are supported in part by the National Science Foundation.
Fran Berman
RDA Approach:
CREATE  ADOPT  USE
RDA Members come together as
• Working Groups – 12-18 month efforts
to build, adopt, and use specific pieces
of infrastructure
• Interest Groups – longer-lived discussion forums that spawn Working Groups
as specific pieces of needed infrastructure are identified.
Working Group efforts focus on the development and use of data
sharing infrastructure
• Code, policy, infrastructure, standards, or best practices that are adopted
and used by communities to enable data sharing
• “Harvestable” efforts for which 12-18 months of work can eliminate a
roadblock
• Efforts that have substantive applicability to groups within the data
community, but may not apply to everyone
• Efforts for which working scientists and researchers can start today
RDA and RDA/US are supported in part by the National Science
Fran Berman
Precipitous Growth
RDA Launch /
First Plenary
March 2013
RDA Second
Plenary
September 2013
RDA Third
Plenary
March 2014
First RDA
organizational
telecon: August
2012
Global Data
Planning
Meeting:
October
2012
First Working
Groups and
Interest Groups
240 participants
First “neutral
space”
community
meeting (Data
Citation Summit)
First Org. Partner
Meet-up
First BOFs
380 participants
from 22 countries
RDA Fourth
Plenary
September 2014
First
Organizational
Assembly
6 co-located
events
14 BOF,
12 Working
Groups, 22
Interest Groups
497
participants Amsterdam
First Working
Group exchange
meeting
RDA Plenary 2
Washington, DC
RDA Plenary 1 /
Launch
Gothenburg, Sweden
RDA Plenary 3
Dublin, Ireland
RDA and RDA/US are supported in part by the National Science Foundation.
Fran Berman
Map courtesy
traveltip.org
Austral-
pacific
4%
Africa
2% South
America
1%
The RDA Community Today: Over 1850
members from 80+ countries (as of 6/14)
Asia
4%
RDA and RDA/US are supported in part by the National Science Foundation.
Fran Berman
RDA Interest (IG) and Working Groups (WG) by
Focus (as of 6/14)
Domain Science - focused
• Toxicogenomics
Interoperability IG
• Structural Biology IG
• Biodiversity Data
Integration IG
• Agricultural Data
Interoperability IG
• Wheat Data Interoperability WG
• Digital Practices in History and
Ethnography IG
• Defining Urban Data Exchange for
Science IG
• Geospatial IG
• Marine Data Harmonization IG
• RDA/CODATA Materials Data
Infrastructure and Interoperability IG
• Research Data Needs of the Photon
and Neutron Science Community IG
Data Stewardship -
focused
• Research Data Provenance IG
• RDA/WDS Certification of
Digital Repositories IG
• Preservation e-infrastructure
IG
• Long-tail of Research Data IG
• RDA/WDS Publishing Data IG
• RDA/WDS Repository Audit
and Certification Working
Group
• Domain Repositories Interest
Group
Reference and Sharing - focused
• Data Citation WG
• Standardization of Data Categories and Codes
WG
• RDA/CODATA Legal Interoperability IG
• Data Description Registry Interoperability
Working Group
Community Needs -
focused
• Community Capability Model IG
• Engagement IG
• Development of Cloud
Computing Capacity and
Education in Developing World
Research IG
• Ethics and Social Aspects of Data
IG
Base Infrastructure - focused
• Data Foundation and Terminology WG
• Metadata Standards Directory WG
• Practical Policy WG
• PID Information Types WG
• Data Type Registries WG
• Data in Context IG
• Big Data Analytics IG
• Data Brokering IG
• Federated Identity Management IG
• Metadata IG
• PID Interest Group
• Service Management IG
Fran Berman
RDA/US Goals:
 Contribute to RDA “international”
efforts and leadership
 Bring US efforts to broader RDA
community
 Build the RDA community within
the US
 Leverage and implement RDA
deliverables in the US to amplify
impact
 Collaborate closely with other RDA
“regions” on key programs and
initiatives
RDA/US: Collaborate Globally,
Contribute Locally
RDA and RDA/US are supported in part by the National Science Foundation.
NSF-supported RDA/US
initiatives:
• Outreach (RDA  RDA/US)
• RDA Deliverables Amplification
• Student / Early Career
Engagement
RDA/US Steering Committee
• Fran Berman, RPI
• Larry Lannom, CNRI
• Mark Parsons, RPI
• Beth Plale, IU
RDA US
membership
(yellow states)
23
The European plug-in to RDA …
 RDA Europe Forum – strategic advice
 RDA Europe Science Workshops –
interaction & feedback from target
audience
 RDA Europe national & pan-European
outreach – to engage new members &
disseminate outputs
 RDA Europe policy report – to support
European policy-makers & funders
RDA Europe, the European plug-in to the global RDA, supports
RDA global and brings European voice to the table
24Europe as a Global Partner
 Societal challenges of our time transcend borders
 Data and computing intensive science is made of
global collaborations
 Research data are global
 Research Data Alliance: enable data exchange at
global scale
25
 Domain initiatives are very important
 Marine data sharing – Southern Ocean Observing
System
 Genetic data sharing – human genome project
 Astronomy – SKA
 CERN LHC
 But domain initiatives will not necessarily enable
bridges to be constructed across disciplines, time, and
industry
 So the EC, the USA, and Australia committed resources
to forming the Research Data Alliance
International
26
 RDA has so far not got enough traction with the HPC
big data and computer science communities
 This will need to be addressed urgently since the HPC
community dealing with Big Data will need a close
interaction with application user communities, support
from the policy makers at national and international
level and of course adequate financial support by the
relevant funding agencies
 Important therefore to work together…
 And link with relevant other initiatives such as NDS in
the US (presented by Ed Seidel yesterday) and such as
EUDAT in EU
Relation to HPC
27
“We are taking our work beyond Europe's borders, to
reach global scale. To make the scientific resources of
the world work together, interoperating and open to
discovery. For example we are working with partners
like the US and Australia in the Research Data Alliance
to make scientific progress broader, deeper and more
workable”.
Neelie Kroes, Vice-President of the European Commission
responsible for the Digital Agenda - Open Access to science and data
= cash and economic bonanza, 19 November 2013
Why a Research Data Alliance?
… So much to gain from collaboration …
28
29
Info:
enquiries@rd-alliance.org
Fran Berman
30
 Input to this presentation kindly provided by Fran
Berman, Hilary Hanahoe and public presentations by
EC officials
 But the opinions expressed in this talk are under my
entire responsibility as any mistake or omission
Thanks for your attention!
Acknowledgments
31
Resources
32First RDA Infrastructure Deliverables
Coming this Fall
Data Type Registries WG
• Deliverables: System of data type registries, formal
model for describing types, working model of a
registry.
• Initial Adopters and Users: CNRI, International
DOI Foundation, Deep Carbon Observatory
Practical Code Policies
• Deliverables: Survey of policies in production use,
testbed of machine actionable policies, deployment
of 5 policy sets, policy starter kits
• Initial Adopters and Users: RENCI, DataNet
Federation Consortium, CESNET, Odum Institute,
EUDAT
Persistent Identifier Information
Types
• Deliverables: Minimal set of PID types, API
• Initial Adopters and Users: Data Conservancy,
DKRZ
Language Codes
• Deliverables: Operationalization of ISO
language categories for repositories.
• Initial Adopters and Users: Language Archive,
Paradisec
Data Foundations and
Terminology
• Deliverables: Common vocabulary for data
terms, formal definitions and open registry for
data terms
• Initial Adopters and Users: EUDAT, DKRZ,
Deep Carbon Observatory, CLARIN, EPOS
Metadata Standards
• Deliverables: Use cases and prototype
directory of current metadata standards starting
from DCC directory
• Initial Adopters and Users: JISC, DataOne
Fran Berman
Next Steps for the RDA
Continuing pipeline of infrastructure
deliverables adopted and used to accelerate
data sharing
Increasing coordination of infrastructure
Increasing cross-boundary collaborations
between domains, sectors, organizations
International and regional programs
focusing on workforce, outreach, expansion
of infrastructure impact
New partners in the Organizational Assembly
Focused strategy to support development of
industry infrastructure for data sharing
More Infrastructure
Focus on Industry
Synergistic Programs
Effective Community
RDA/US is supported in part by the National Science Foundation.

Rdaeu russia_fg_1_july2014_final

  • 1.
    The Research DataAlliance in Europe, an update… EXTREME SCALE SCIENTIFIC COMPUTING WORKSHOP Moscow – 30 June & 1 July 2014 Fabrizio Gagliardi BSC, Spain - ACM Europe Chair
  • 2.
    2  Fabrizio Gagliardireborn in BSC, Spain  After 30 years at CERN in Geneva  Many EU projects  And last 8 years in Microsoft and Microsoft Research  Long history of projects in Russia on Grid computing, Big data, HPC and computing vision @ MSU and MSR HPC summer schools 2009- 2012 Introduction
  • 3.
    3 Big data, hypeand HPC “Big data” means different things to different people (consider Satoshi’s previous talk) • corporate data are not so big and demanding when compared to scientific data • social data are large but access is easy and trivially parallel • scientific data in new research domains like genetics is a bigger challenge • not true for all scientific data, CERN will produce 100 PB/year starting next year but with easy access and simple processing models, still a very expensive game…
  • 4.
    4 Horizon2020: Research andInnovation Horizon 2020 is the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020). In addition to the private investment that this money will attract. It promises more breakthroughs, discoveries and world- firsts by taking great ideas from the lab to the market.
  • 5.
    5Research and Innovation Research AND Innovation, not Research OR Innovation  Research activities with innovation in mind  Innovation should have job creation in mind  But how to take great ideas from the lab to the market?  What can a research funder do?  Which instruments do we have?
  • 6.
    6job creation isimportant Following slides adapted from Joe McKendrick/Forbes, September 2012 http://www.smartplanet.com/blog/bulletin/7-new-types-of-jobs-created-by-big- data/682 7 new types of jobs created by Big Data In today’s unforgiving global economy, those organizations that compete on analytics stand the best chance of outsmarting the competition. The only catch is, they need skilled professionals who know how to manage, mine and draw actionable insights from all the “Big Data” now streaming across enterprises.
  • 7.
    7job creation isimportant 1. Data scientists: this emerging role is taking the lead in processing raw data and determining what types of analysis would deliver the best results 2. Data architects: organizations managing Big Data need professionals who will be able to build a data model, and plan out a roadmap of how and when various data sources and analytical tools will come online, and how they will all fit together 3. Data visualizers: organizations need professionals who can “harness the data and put it in context, in layman’s language, exploring what the data means and how it will impact the company”
  • 8.
    8job creation isimportant 4. Data change agents: driving “changes in internal operations and processes based on data analytics.” They need to be good communicators, they know how to apply statistics to improve quality on a continuous basis 5. Data engineer/operators: people that make the Big Data infrastructure hum on a day-to-day basis. “They develop the architecture that helps analyse and supply data in the way the business needs, and make sure systems are performing smoothly” 6. Data stewards: ensure that data sources are properly accounted for 7. Data virtualization/cloud specialists: ability to build and maintain a virtualized data service layer; organizations need professionals that can also build and support these virtualized layers or clouds
  • 9.
    9 network infrastructure, GÉANT HPC/distributedcomputing/software infrastructure scientific data infrastructure e-infrastructure building bridges
  • 10.
    10issues to beaddressed (e-infrastructure) The EC in coordination with EU Member States is looking after research data as an infrastructure As a valuable and a strategic resource, research data opens at least three key issues to be addressed(*) :  How data can be networked  How to envision and set up data governance on a global scale  How the EU can play a leading role in helping start and steer this global trend (*) Fred Friend, Jean-Claude Guédon Herbert van Sompel “Beyond Sharing and Re-using: Toward Global Data Networking”
  • 11.
    11Policy context A ReinforcedEuropean Research Area Partnership for Excellence and Growth, COM(2012) 392 – July 2012 Towards better access to scientific information: boosting the benefits of public investments in research, COM(2012) 401 final - July2012 Commission, Recommendation on access and preservation of scientific information, C(2012) 4890 final – July 2012 Horizon 2020  - Open Access to Scientific Publications  - Pilot on research data Data Management Plan Open Science
  • 12.
    12 RESEARCH INFRASTRUCTURE (E-INFRASTRUCTUREHIGHLIHGTED) Work Programme 2014-2015 CALL 1 DEVELOPING NEW WORLD CLASS INFRASTRUCTURES CALL 2 INTEGRATING AND OPENING RESEARCH INFRASTRUCTURES OF PAN-EUROPEAN INTEREST CALL 3 E-INFRASTRUCTURES CALL 4 SUPPORT TO INNOVATION, HUMAN RESOURCES, POLICY AND INTERNATIONAL COOPERATION FOR RESEARCH INFRASTRUCTURES DESIG N STUDIE S SUPPORT TO PREPARATORY PHASE OF ESFRI PROJECTS SUPPORT TO THE INDIVIDUAL IMPLEMENTATION AND OPERATION OF ESFRI PROJECTS SUPPORT TO THE IMPLEMENTATION OF CROSS- CUTTING INFRASTRUCTURE SERVICES AND SOLUTIONS FOR CLUSTER OF ESFRI AND OTHER RILEVANT RESEARCH INFRASTRUCTURE INITIATIVES IN A GIVEN THEMATIC AREA INTEGRATING AND OPENING EXISTING NATIONAL AND REGIONAL RESEARCH INFRASTRUCTURES OF PAN-EUTROPEAN INTEREST MANAGING, PRESERVING AND COMPUTING WITH BIG RESERACH DATA E- INFRASTRUCTURE S FOR OPEN ACCESS TOWARDS GLOBAL DATA E-INFRASTRUCTURES: RESEARCH DATA ALLIANCE Pan-European High Performance Computing infrastructure and services Centres of Excellence for Computing applications Network of HPC Competence Centres for SMEs PROVISION OF CORE SERVICES ACROSS E- INFRASTRUCTURE S RESEARCH AND EDUCATION NETWORKING – GEANT E-INFRASTRUCTURES FOR VIRTUAL RESEARCH ENVIRONMENTS (VRE) INNOVATI ON SUPPORT MEASURE S INNOVATIVE PROCUREMENT PILOT ACTION IN THE FIELD OF SCIENTIFIC INSTRUMENTATION STRENGTHENING THE HUMAN CAPITAL OF RESEARCH INFRASTRUCTURES NEW PROFESSIONS AND SKILLS FOR E- INFRASTRUCTURES POLICY MEASURES FOR RESEARCH INFRASTRUCTUR ES INTERNATIONAL COOPERATION FOR RESEARCH INFRASTRUCTURES E-INFRASTRUCTURE POLICY DEVELOPMENT AND INTERNATIONAL COOPERATION NETWORK OF NATIONAL CONTACT POINTS CALLS IN 2014 DEADLINES SEPT 2014 AND JAN 2015 INITIATIVES STARTING IN 2015 UNTIL 2018
  • 13.
    Fran Berman Research DataDriving Solutions to Complex Scientific and Societal Challenges Who is most at risk to contract asthma? How can we increase wheat yields? How accurate is the Standard Model of Physics? Image: Lucas Taylor How can we best address energy needs and sustain the environment? Image: Ceinturion, Wikipedia
  • 14.
    Fran Berman Data-Sharing DrivingInnovation Across Sectors and Communities
  • 15.
    Fran Berman World-wide EffortsFocusing on Infrastructure to Support Research Data Sharing, Access, Use Science, Humanities, Arts Communities E-Infrastructure professionals, data analysts, data center staff, … Data Scientists Libraries, Archives, Repositories, Museums
  • 16.
    Fran Berman Institutional Data SharingPractice Data Access and Distribution Policy Data Discovery Tools Common Metadata Standards Digital Object Identifiers Data Citation Standards Data Analytics Algorithms Data Preservation Practice Data Scientists and Expert Support Sustainable Economic Models Curation Practice and Policy Auditing, Certification and Reporting Practice Fran Berman Many Infrastructure Building Blocks Needed to Accelerate Progress Data Use and Re-use Data Discovery and Data Sharing Research Dissemination and Reproducibility Data Access (now) and Preservation (later)
  • 17.
    Fran Berman Research DataAlliance Created to Accelerate Development of Research Data Sharing Infrastructure Worldwide  RDA community efforts focus on building social, organizational and technical infrastructure to  reduce barriers to data sharing and exchange  accelerate the development of coordinated global data infrastructure RDA and RDA/US are supported in part by the National Science Foundation.
  • 18.
    Fran Berman RDA Approach: CREATE ADOPT  USE RDA Members come together as • Working Groups – 12-18 month efforts to build, adopt, and use specific pieces of infrastructure • Interest Groups – longer-lived discussion forums that spawn Working Groups as specific pieces of needed infrastructure are identified. Working Group efforts focus on the development and use of data sharing infrastructure • Code, policy, infrastructure, standards, or best practices that are adopted and used by communities to enable data sharing • “Harvestable” efforts for which 12-18 months of work can eliminate a roadblock • Efforts that have substantive applicability to groups within the data community, but may not apply to everyone • Efforts for which working scientists and researchers can start today RDA and RDA/US are supported in part by the National Science
  • 19.
    Fran Berman Precipitous Growth RDALaunch / First Plenary March 2013 RDA Second Plenary September 2013 RDA Third Plenary March 2014 First RDA organizational telecon: August 2012 Global Data Planning Meeting: October 2012 First Working Groups and Interest Groups 240 participants First “neutral space” community meeting (Data Citation Summit) First Org. Partner Meet-up First BOFs 380 participants from 22 countries RDA Fourth Plenary September 2014 First Organizational Assembly 6 co-located events 14 BOF, 12 Working Groups, 22 Interest Groups 497 participants Amsterdam First Working Group exchange meeting RDA Plenary 2 Washington, DC RDA Plenary 1 / Launch Gothenburg, Sweden RDA Plenary 3 Dublin, Ireland RDA and RDA/US are supported in part by the National Science Foundation.
  • 20.
    Fran Berman Map courtesy traveltip.org Austral- pacific 4% Africa 2%South America 1% The RDA Community Today: Over 1850 members from 80+ countries (as of 6/14) Asia 4% RDA and RDA/US are supported in part by the National Science Foundation.
  • 21.
    Fran Berman RDA Interest(IG) and Working Groups (WG) by Focus (as of 6/14) Domain Science - focused • Toxicogenomics Interoperability IG • Structural Biology IG • Biodiversity Data Integration IG • Agricultural Data Interoperability IG • Wheat Data Interoperability WG • Digital Practices in History and Ethnography IG • Defining Urban Data Exchange for Science IG • Geospatial IG • Marine Data Harmonization IG • RDA/CODATA Materials Data Infrastructure and Interoperability IG • Research Data Needs of the Photon and Neutron Science Community IG Data Stewardship - focused • Research Data Provenance IG • RDA/WDS Certification of Digital Repositories IG • Preservation e-infrastructure IG • Long-tail of Research Data IG • RDA/WDS Publishing Data IG • RDA/WDS Repository Audit and Certification Working Group • Domain Repositories Interest Group Reference and Sharing - focused • Data Citation WG • Standardization of Data Categories and Codes WG • RDA/CODATA Legal Interoperability IG • Data Description Registry Interoperability Working Group Community Needs - focused • Community Capability Model IG • Engagement IG • Development of Cloud Computing Capacity and Education in Developing World Research IG • Ethics and Social Aspects of Data IG Base Infrastructure - focused • Data Foundation and Terminology WG • Metadata Standards Directory WG • Practical Policy WG • PID Information Types WG • Data Type Registries WG • Data in Context IG • Big Data Analytics IG • Data Brokering IG • Federated Identity Management IG • Metadata IG • PID Interest Group • Service Management IG
  • 22.
    Fran Berman RDA/US Goals: Contribute to RDA “international” efforts and leadership  Bring US efforts to broader RDA community  Build the RDA community within the US  Leverage and implement RDA deliverables in the US to amplify impact  Collaborate closely with other RDA “regions” on key programs and initiatives RDA/US: Collaborate Globally, Contribute Locally RDA and RDA/US are supported in part by the National Science Foundation. NSF-supported RDA/US initiatives: • Outreach (RDA  RDA/US) • RDA Deliverables Amplification • Student / Early Career Engagement RDA/US Steering Committee • Fran Berman, RPI • Larry Lannom, CNRI • Mark Parsons, RPI • Beth Plale, IU RDA US membership (yellow states)
  • 23.
    23 The European plug-into RDA …  RDA Europe Forum – strategic advice  RDA Europe Science Workshops – interaction & feedback from target audience  RDA Europe national & pan-European outreach – to engage new members & disseminate outputs  RDA Europe policy report – to support European policy-makers & funders RDA Europe, the European plug-in to the global RDA, supports RDA global and brings European voice to the table
  • 24.
    24Europe as aGlobal Partner  Societal challenges of our time transcend borders  Data and computing intensive science is made of global collaborations  Research data are global  Research Data Alliance: enable data exchange at global scale
  • 25.
    25  Domain initiativesare very important  Marine data sharing – Southern Ocean Observing System  Genetic data sharing – human genome project  Astronomy – SKA  CERN LHC  But domain initiatives will not necessarily enable bridges to be constructed across disciplines, time, and industry  So the EC, the USA, and Australia committed resources to forming the Research Data Alliance International
  • 26.
    26  RDA hasso far not got enough traction with the HPC big data and computer science communities  This will need to be addressed urgently since the HPC community dealing with Big Data will need a close interaction with application user communities, support from the policy makers at national and international level and of course adequate financial support by the relevant funding agencies  Important therefore to work together…  And link with relevant other initiatives such as NDS in the US (presented by Ed Seidel yesterday) and such as EUDAT in EU Relation to HPC
  • 27.
    27 “We are takingour work beyond Europe's borders, to reach global scale. To make the scientific resources of the world work together, interoperating and open to discovery. For example we are working with partners like the US and Australia in the Research Data Alliance to make scientific progress broader, deeper and more workable”. Neelie Kroes, Vice-President of the European Commission responsible for the Digital Agenda - Open Access to science and data = cash and economic bonanza, 19 November 2013 Why a Research Data Alliance? … So much to gain from collaboration …
  • 28.
  • 29.
  • 30.
    30  Input tothis presentation kindly provided by Fran Berman, Hilary Hanahoe and public presentations by EC officials  But the opinions expressed in this talk are under my entire responsibility as any mistake or omission Thanks for your attention! Acknowledgments
  • 31.
  • 32.
    32First RDA InfrastructureDeliverables Coming this Fall Data Type Registries WG • Deliverables: System of data type registries, formal model for describing types, working model of a registry. • Initial Adopters and Users: CNRI, International DOI Foundation, Deep Carbon Observatory Practical Code Policies • Deliverables: Survey of policies in production use, testbed of machine actionable policies, deployment of 5 policy sets, policy starter kits • Initial Adopters and Users: RENCI, DataNet Federation Consortium, CESNET, Odum Institute, EUDAT Persistent Identifier Information Types • Deliverables: Minimal set of PID types, API • Initial Adopters and Users: Data Conservancy, DKRZ Language Codes • Deliverables: Operationalization of ISO language categories for repositories. • Initial Adopters and Users: Language Archive, Paradisec Data Foundations and Terminology • Deliverables: Common vocabulary for data terms, formal definitions and open registry for data terms • Initial Adopters and Users: EUDAT, DKRZ, Deep Carbon Observatory, CLARIN, EPOS Metadata Standards • Deliverables: Use cases and prototype directory of current metadata standards starting from DCC directory • Initial Adopters and Users: JISC, DataOne
  • 33.
    Fran Berman Next Stepsfor the RDA Continuing pipeline of infrastructure deliverables adopted and used to accelerate data sharing Increasing coordination of infrastructure Increasing cross-boundary collaborations between domains, sectors, organizations International and regional programs focusing on workforce, outreach, expansion of infrastructure impact New partners in the Organizational Assembly Focused strategy to support development of industry infrastructure for data sharing More Infrastructure Focus on Industry Synergistic Programs Effective Community RDA/US is supported in part by the National Science Foundation.