Your SlideShare is downloading. ×
Rdaeu  russia_fg_1_july2014_final
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Rdaeu russia_fg_1_july2014_final

87

Published on

Presentation to Russian and other computer scientists by Fabrizio Gagliardi (with many slides from H. Hanahoe and F. Berman)

Presentation to Russian and other computer scientists by Fabrizio Gagliardi (with many slides from H. Hanahoe and F. Berman)

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
87
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Research Data Alliance in Europe, an update… EXTREME SCALE SCIENTIFIC COMPUTING WORKSHOP Moscow – 30 June & 1 July 2014 Fabrizio Gagliardi BSC, Spain - ACM Europe Chair
  • 2. 2  Fabrizio Gagliardi reborn in BSC, Spain  After 30 years at CERN in Geneva  Many EU projects  And last 8 years in Microsoft and Microsoft Research  Long history of projects in Russia on Grid computing, Big data, HPC and computing vision @ MSU and MSR HPC summer schools 2009- 2012 Introduction
  • 3. 3 Big data, hype and HPC “Big data” means different things to different people (consider Satoshi’s previous talk) • corporate data are not so big and demanding when compared to scientific data • social data are large but access is easy and trivially parallel • scientific data in new research domains like genetics is a bigger challenge • not true for all scientific data, CERN will produce 100 PB/year starting next year but with easy access and simple processing models, still a very expensive game…
  • 4. 4 Horizon2020: Research and Innovation Horizon 2020 is the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020). In addition to the private investment that this money will attract. It promises more breakthroughs, discoveries and world- firsts by taking great ideas from the lab to the market.
  • 5. 5Research and Innovation  Research AND Innovation, not Research OR Innovation  Research activities with innovation in mind  Innovation should have job creation in mind  But how to take great ideas from the lab to the market?  What can a research funder do?  Which instruments do we have?
  • 6. 6job creation is important Following slides adapted from Joe McKendrick/Forbes, September 2012 http://www.smartplanet.com/blog/bulletin/7-new-types-of-jobs-created-by-big- data/682 7 new types of jobs created by Big Data In today’s unforgiving global economy, those organizations that compete on analytics stand the best chance of outsmarting the competition. The only catch is, they need skilled professionals who know how to manage, mine and draw actionable insights from all the “Big Data” now streaming across enterprises.
  • 7. 7job creation is important 1. Data scientists: this emerging role is taking the lead in processing raw data and determining what types of analysis would deliver the best results 2. Data architects: organizations managing Big Data need professionals who will be able to build a data model, and plan out a roadmap of how and when various data sources and analytical tools will come online, and how they will all fit together 3. Data visualizers: organizations need professionals who can “harness the data and put it in context, in layman’s language, exploring what the data means and how it will impact the company”
  • 8. 8job creation is important 4. Data change agents: driving “changes in internal operations and processes based on data analytics.” They need to be good communicators, they know how to apply statistics to improve quality on a continuous basis 5. Data engineer/operators: people that make the Big Data infrastructure hum on a day-to-day basis. “They develop the architecture that helps analyse and supply data in the way the business needs, and make sure systems are performing smoothly” 6. Data stewards: ensure that data sources are properly accounted for 7. Data virtualization/cloud specialists: ability to build and maintain a virtualized data service layer; organizations need professionals that can also build and support these virtualized layers or clouds
  • 9. 9 network infrastructure, GÉANT HPC/distributed computing/software infrastructure scientific data infrastructure e-infrastructure building bridges
  • 10. 10issues to be addressed (e-infrastructure) The EC in coordination with EU Member States is looking after research data as an infrastructure As a valuable and a strategic resource, research data opens at least three key issues to be addressed(*) :  How data can be networked  How to envision and set up data governance on a global scale  How the EU can play a leading role in helping start and steer this global trend (*) Fred Friend, Jean-Claude Guédon Herbert van Sompel “Beyond Sharing and Re-using: Toward Global Data Networking”
  • 11. 11Policy context A Reinforced European Research Area Partnership for Excellence and Growth, COM(2012) 392 – July 2012 Towards better access to scientific information: boosting the benefits of public investments in research, COM(2012) 401 final - July2012 Commission, Recommendation on access and preservation of scientific information, C(2012) 4890 final – July 2012 Horizon 2020  - Open Access to Scientific Publications  - Pilot on research data Data Management Plan Open Science
  • 12. 12 RESEARCH INFRASTRUCTURE (E-INFRASTRUCTURE HIGHLIHGTED) Work Programme 2014-2015 CALL 1 DEVELOPING NEW WORLD CLASS INFRASTRUCTURES CALL 2 INTEGRATING AND OPENING RESEARCH INFRASTRUCTURES OF PAN-EUROPEAN INTEREST CALL 3 E-INFRASTRUCTURES CALL 4 SUPPORT TO INNOVATION, HUMAN RESOURCES, POLICY AND INTERNATIONAL COOPERATION FOR RESEARCH INFRASTRUCTURES DESIG N STUDIE S SUPPORT TO PREPARATORY PHASE OF ESFRI PROJECTS SUPPORT TO THE INDIVIDUAL IMPLEMENTATION AND OPERATION OF ESFRI PROJECTS SUPPORT TO THE IMPLEMENTATION OF CROSS- CUTTING INFRASTRUCTURE SERVICES AND SOLUTIONS FOR CLUSTER OF ESFRI AND OTHER RILEVANT RESEARCH INFRASTRUCTURE INITIATIVES IN A GIVEN THEMATIC AREA INTEGRATING AND OPENING EXISTING NATIONAL AND REGIONAL RESEARCH INFRASTRUCTURES OF PAN-EUTROPEAN INTEREST MANAGING, PRESERVING AND COMPUTING WITH BIG RESERACH DATA E- INFRASTRUCTURE S FOR OPEN ACCESS TOWARDS GLOBAL DATA E-INFRASTRUCTURES: RESEARCH DATA ALLIANCE Pan-European High Performance Computing infrastructure and services Centres of Excellence for Computing applications Network of HPC Competence Centres for SMEs PROVISION OF CORE SERVICES ACROSS E- INFRASTRUCTURE S RESEARCH AND EDUCATION NETWORKING – GEANT E-INFRASTRUCTURES FOR VIRTUAL RESEARCH ENVIRONMENTS (VRE) INNOVATI ON SUPPORT MEASURE S INNOVATIVE PROCUREMENT PILOT ACTION IN THE FIELD OF SCIENTIFIC INSTRUMENTATION STRENGTHENING THE HUMAN CAPITAL OF RESEARCH INFRASTRUCTURES NEW PROFESSIONS AND SKILLS FOR E- INFRASTRUCTURES POLICY MEASURES FOR RESEARCH INFRASTRUCTUR ES INTERNATIONAL COOPERATION FOR RESEARCH INFRASTRUCTURES E-INFRASTRUCTURE POLICY DEVELOPMENT AND INTERNATIONAL COOPERATION NETWORK OF NATIONAL CONTACT POINTS CALLS IN 2014 DEADLINES SEPT 2014 AND JAN 2015 INITIATIVES STARTING IN 2015 UNTIL 2018
  • 13. Fran Berman Research Data Driving Solutions to Complex Scientific and Societal Challenges Who is most at risk to contract asthma? How can we increase wheat yields? How accurate is the Standard Model of Physics? Image: Lucas Taylor How can we best address energy needs and sustain the environment? Image: Ceinturion, Wikipedia
  • 14. Fran Berman Data-Sharing Driving Innovation Across Sectors and Communities
  • 15. Fran Berman World-wide Efforts Focusing on Infrastructure to Support Research Data Sharing, Access, Use Science, Humanities, Arts Communities E-Infrastructure professionals, data analysts, data center staff, … Data Scientists Libraries, Archives, Repositories, Museums
  • 16. Fran Berman Institutional Data Sharing Practice Data Access and Distribution Policy Data Discovery Tools Common Metadata Standards Digital Object Identifiers Data Citation Standards Data Analytics Algorithms Data Preservation Practice Data Scientists and Expert Support Sustainable Economic Models Curation Practice and Policy Auditing, Certification and Reporting Practice Fran Berman Many Infrastructure Building Blocks Needed to Accelerate Progress Data Use and Re-use Data Discovery and Data Sharing Research Dissemination and Reproducibility Data Access (now) and Preservation (later)
  • 17. Fran Berman Research Data Alliance Created to Accelerate Development of Research Data Sharing Infrastructure Worldwide  RDA community efforts focus on building social, organizational and technical infrastructure to  reduce barriers to data sharing and exchange  accelerate the development of coordinated global data infrastructure RDA and RDA/US are supported in part by the National Science Foundation.
  • 18. Fran Berman RDA Approach: CREATE  ADOPT  USE RDA Members come together as • Working Groups – 12-18 month efforts to build, adopt, and use specific pieces of infrastructure • Interest Groups – longer-lived discussion forums that spawn Working Groups as specific pieces of needed infrastructure are identified. Working Group efforts focus on the development and use of data sharing infrastructure • Code, policy, infrastructure, standards, or best practices that are adopted and used by communities to enable data sharing • “Harvestable” efforts for which 12-18 months of work can eliminate a roadblock • Efforts that have substantive applicability to groups within the data community, but may not apply to everyone • Efforts for which working scientists and researchers can start today RDA and RDA/US are supported in part by the National Science
  • 19. Fran Berman Precipitous Growth RDA Launch / First Plenary March 2013 RDA Second Plenary September 2013 RDA Third Plenary March 2014 First RDA organizational telecon: August 2012 Global Data Planning Meeting: October 2012 First Working Groups and Interest Groups 240 participants First “neutral space” community meeting (Data Citation Summit) First Org. Partner Meet-up First BOFs 380 participants from 22 countries RDA Fourth Plenary September 2014 First Organizational Assembly 6 co-located events 14 BOF, 12 Working Groups, 22 Interest Groups 497 participants Amsterdam First Working Group exchange meeting RDA Plenary 2 Washington, DC RDA Plenary 1 / Launch Gothenburg, Sweden RDA Plenary 3 Dublin, Ireland RDA and RDA/US are supported in part by the National Science Foundation.
  • 20. Fran Berman Map courtesy traveltip.org Austral- pacific 4% Africa 2% South America 1% The RDA Community Today: Over 1850 members from 80+ countries (as of 6/14) Asia 4% RDA and RDA/US are supported in part by the National Science Foundation.
  • 21. Fran Berman RDA Interest (IG) and Working Groups (WG) by Focus (as of 6/14) Domain Science - focused • Toxicogenomics Interoperability IG • Structural Biology IG • Biodiversity Data Integration IG • Agricultural Data Interoperability IG • Wheat Data Interoperability WG • Digital Practices in History and Ethnography IG • Defining Urban Data Exchange for Science IG • Geospatial IG • Marine Data Harmonization IG • RDA/CODATA Materials Data Infrastructure and Interoperability IG • Research Data Needs of the Photon and Neutron Science Community IG Data Stewardship - focused • Research Data Provenance IG • RDA/WDS Certification of Digital Repositories IG • Preservation e-infrastructure IG • Long-tail of Research Data IG • RDA/WDS Publishing Data IG • RDA/WDS Repository Audit and Certification Working Group • Domain Repositories Interest Group Reference and Sharing - focused • Data Citation WG • Standardization of Data Categories and Codes WG • RDA/CODATA Legal Interoperability IG • Data Description Registry Interoperability Working Group Community Needs - focused • Community Capability Model IG • Engagement IG • Development of Cloud Computing Capacity and Education in Developing World Research IG • Ethics and Social Aspects of Data IG Base Infrastructure - focused • Data Foundation and Terminology WG • Metadata Standards Directory WG • Practical Policy WG • PID Information Types WG • Data Type Registries WG • Data in Context IG • Big Data Analytics IG • Data Brokering IG • Federated Identity Management IG • Metadata IG • PID Interest Group • Service Management IG
  • 22. Fran Berman RDA/US Goals:  Contribute to RDA “international” efforts and leadership  Bring US efforts to broader RDA community  Build the RDA community within the US  Leverage and implement RDA deliverables in the US to amplify impact  Collaborate closely with other RDA “regions” on key programs and initiatives RDA/US: Collaborate Globally, Contribute Locally RDA and RDA/US are supported in part by the National Science Foundation. NSF-supported RDA/US initiatives: • Outreach (RDA  RDA/US) • RDA Deliverables Amplification • Student / Early Career Engagement RDA/US Steering Committee • Fran Berman, RPI • Larry Lannom, CNRI • Mark Parsons, RPI • Beth Plale, IU RDA US membership (yellow states)
  • 23. 23 The European plug-in to RDA …  RDA Europe Forum – strategic advice  RDA Europe Science Workshops – interaction & feedback from target audience  RDA Europe national & pan-European outreach – to engage new members & disseminate outputs  RDA Europe policy report – to support European policy-makers & funders RDA Europe, the European plug-in to the global RDA, supports RDA global and brings European voice to the table
  • 24. 24Europe as a Global Partner  Societal challenges of our time transcend borders  Data and computing intensive science is made of global collaborations  Research data are global  Research Data Alliance: enable data exchange at global scale
  • 25. 25  Domain initiatives are very important  Marine data sharing – Southern Ocean Observing System  Genetic data sharing – human genome project  Astronomy – SKA  CERN LHC  But domain initiatives will not necessarily enable bridges to be constructed across disciplines, time, and industry  So the EC, the USA, and Australia committed resources to forming the Research Data Alliance International
  • 26. 26  RDA has so far not got enough traction with the HPC big data and computer science communities  This will need to be addressed urgently since the HPC community dealing with Big Data will need a close interaction with application user communities, support from the policy makers at national and international level and of course adequate financial support by the relevant funding agencies  Important therefore to work together…  And link with relevant other initiatives such as NDS in the US (presented by Ed Seidel yesterday) and such as EUDAT in EU Relation to HPC
  • 27. 27 “We are taking our work beyond Europe's borders, to reach global scale. To make the scientific resources of the world work together, interoperating and open to discovery. For example we are working with partners like the US and Australia in the Research Data Alliance to make scientific progress broader, deeper and more workable”. Neelie Kroes, Vice-President of the European Commission responsible for the Digital Agenda - Open Access to science and data = cash and economic bonanza, 19 November 2013 Why a Research Data Alliance? … So much to gain from collaboration …
  • 28. 28
  • 29. 29 Info: enquiries@rd-alliance.org Fran Berman
  • 30. 30  Input to this presentation kindly provided by Fran Berman, Hilary Hanahoe and public presentations by EC officials  But the opinions expressed in this talk are under my entire responsibility as any mistake or omission Thanks for your attention! Acknowledgments
  • 31. 31 Resources
  • 32. 32First RDA Infrastructure Deliverables Coming this Fall Data Type Registries WG • Deliverables: System of data type registries, formal model for describing types, working model of a registry. • Initial Adopters and Users: CNRI, International DOI Foundation, Deep Carbon Observatory Practical Code Policies • Deliverables: Survey of policies in production use, testbed of machine actionable policies, deployment of 5 policy sets, policy starter kits • Initial Adopters and Users: RENCI, DataNet Federation Consortium, CESNET, Odum Institute, EUDAT Persistent Identifier Information Types • Deliverables: Minimal set of PID types, API • Initial Adopters and Users: Data Conservancy, DKRZ Language Codes • Deliverables: Operationalization of ISO language categories for repositories. • Initial Adopters and Users: Language Archive, Paradisec Data Foundations and Terminology • Deliverables: Common vocabulary for data terms, formal definitions and open registry for data terms • Initial Adopters and Users: EUDAT, DKRZ, Deep Carbon Observatory, CLARIN, EPOS Metadata Standards • Deliverables: Use cases and prototype directory of current metadata standards starting from DCC directory • Initial Adopters and Users: JISC, DataOne
  • 33. Fran Berman Next Steps for the RDA Continuing pipeline of infrastructure deliverables adopted and used to accelerate data sharing Increasing coordination of infrastructure Increasing cross-boundary collaborations between domains, sectors, organizations International and regional programs focusing on workforce, outreach, expansion of infrastructure impact New partners in the Organizational Assembly Focused strategy to support development of industry infrastructure for data sharing More Infrastructure Focus on Industry Synergistic Programs Effective Community RDA/US is supported in part by the National Science Foundation.

×