SlideShare a Scribd company logo
E-Infrastructure support for the life sciences:
Preparing for the data deluge
Rafael Jimenez
ELIXIR CTO
16 May, 2014
BioMedBridges
Summary Day 1
How does it affect data
sharing in life sciences?
Problems of big data
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
Compute Compute
Compute
Storage Compute Transfer
Transfer
Transfer Transfer
Transfer
Storage Storage
Storage
What How Where
Knowledge exchange workshop
 Discussion of big data challenges in life sciences
 Focus on few representative domains
 Looking 5 years ahead
 Jointly identify potential solutions to our problems
Data
ICT
e-infrastructures
LS
life sciencesPhysical facilities
Scientific information
Transfer
Computation
Storage
Data challenges of different science communities
e-infrastructures
Open discussion
• Storage problem (cost vs. technology)
• Privacy influence storage/sharing
• Network protocols
Group sessions
Group discussions
Group discussion, session 1 (5 groups)
 How much data and what type of data?
 Who are the stakeholders?
 What factors can influence data availability?
Group discussion, session 2 (5 groups)
 What are the potential bottlenecks per stakeholder?
 What are the potential solutions to these bottlenecks?
Stakeholders
Researchers, Patients, Industry, Local users,
Clinical, Academic, Pressure groups, IT in hospitals,
Pharma, Agri, Structural Biologist, LS, RI Nodes,
Institutions, Algorithm developers, Genomics
researchers, Personalized medicine, Funders, TAX
payers, EuroBioimaging, Facilities, PDB, Institutes,
Commercial data provider, Data repositories
Types (Producer, Data resource, Consumer)
Production distribution +(Genomics, Clinical, Metabolomics,
Proteomics MS, Proteomics ST, Imaging)-.
Privacy
How much data and types
Raw data, Process data, Metadata
+(Genomics, Clinical, Proteomics ST,
Proteomics MS, Metabolomics, Imaging)-
Factors that can influence data
availability
 scientific (e.g. data reproducibility, uniqueness, value of
processed and/or raw data)
 financial (cost of data storage, transfer, reproduction)
 technical (storage, network, computation…)
 political (drivers e.g. from funding bodies/large
organisations/national interests)
 social (data sharing mentality of the community in
question)
 legal/ethical/formal (requirements/constraints for data
storage/transfer/access - e.g. need to store data on
German citizens in Germany; requirements from journal
publishers, data management plans, etc.)
Bottlenecks
 Storage
 Data grows faster than data storage (G,P,M,I,C?)
 Security restricts how to store/share some of the data (G,C)
 Keeping data close
 Raw data is not always stored (PST, I, Pms)
 Missing repositories (I)
 Repositories storing a small part of the data (Pms, M)
 Transfer
 Data submissions (Pms, G)
 Transferring to repositories slower than producing
 Downloading (G,P,I)
 Just copying data to a HD (Pst)
 Same time than producing data
 Computation
 Preprocessing slower than producing (Pst)
Producer, Data resource, Consumer
(G)enomics, (C)linical, (P)roteomics (st),
(P)roteomics (ms), (M)etabolomics, (I)maging
Potential solutions
 Storage
 Solve problems with technology (e.g. compression)
 Evaluate data reproducibility
 Network
 Faster protocols
 Partitioning
 Network upgrade
 Computation
 Clouds
 General
 Buy services instead of investing in infrastructure
Producer, Data resource, Consumer
Summary of yesterday’s discussion
Solutions for big data in other science communities
Group sessions
Group sessions
Closing discussion

More Related Content

What's hot

What's hot (20)

Hagen NTIS SLA 2011
Hagen NTIS SLA 2011Hagen NTIS SLA 2011
Hagen NTIS SLA 2011
 
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
 
Research Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of EdinburghResearch Data Management (RDM) Initiatives at the University of Edinburgh
Research Data Management (RDM) Initiatives at the University of Edinburgh
 
Data management planning – what it is and how to do it
Data management planning – what it is and how to do itData management planning – what it is and how to do it
Data management planning – what it is and how to do it
 
Research data management: a brief introduction
Research data management: a brief introductionResearch data management: a brief introduction
Research data management: a brief introduction
 
SDI – National to Global: perspectives from the UK academic sector
SDI – National to Global: perspectives from the UK academic sector SDI – National to Global: perspectives from the UK academic sector
SDI – National to Global: perspectives from the UK academic sector
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
Guiding users through data deposit
Guiding users through data depositGuiding users through data deposit
Guiding users through data deposit
 
RDM for trainee physicians
RDM for trainee physiciansRDM for trainee physicians
RDM for trainee physicians
 
The Regulation of Text and Data Mining
The Regulation of Text and Data MiningThe Regulation of Text and Data Mining
The Regulation of Text and Data Mining
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommons
 
Delivering Postgraduate Training - MANTRA
Delivering Postgraduate Training - MANTRADelivering Postgraduate Training - MANTRA
Delivering Postgraduate Training - MANTRA
 
Six Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShareSix Use Cases for Edinburgh DataShare
Six Use Cases for Edinburgh DataShare
 
Introduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisIntroduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data Analysis
 
The importance of research data repositories
The importance of research data repositoriesThe importance of research data repositories
The importance of research data repositories
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to know
 
IASSIST40: Data management & curation workshop
IASSIST40: Data management & curation workshopIASSIST40: Data management & curation workshop
IASSIST40: Data management & curation workshop
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 

Similar to Challenges of big data. Summary day 1.

Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
Rafael C. Jimenez
 
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxIntroduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptx
datapro2
 

Similar to Challenges of big data. Summary day 1. (20)

Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Trm Introduction
Trm IntroductionTrm Introduction
Trm Introduction
 
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
IFLA ARL Webinar Series: Digital Preservation - Managing Publications and Dat...
 
Current and emerging scientific data curation practices
Current and emerging scientific data curation practicesCurrent and emerging scientific data curation practices
Current and emerging scientific data curation practices
 
Open Science Governance and Regulation/Simon Hodson
Open Science Governance and Regulation/Simon HodsonOpen Science Governance and Regulation/Simon Hodson
Open Science Governance and Regulation/Simon Hodson
 
10probs.ppt
10probs.ppt10probs.ppt
10probs.ppt
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 
Digital preservation
Digital preservationDigital preservation
Digital preservation
 
Metadata for digital long-term preservation
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservation
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxIntroduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptx
 
Introduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptxIntroduction to Data Science 5-13.pptx
Introduction to Data Science 5-13.pptx
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Introduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptxIntroduction to Data Science 1113.pptx
Introduction to Data Science 1113.pptx
 
Science & Technology in a Wired World
Science & Technology in a Wired WorldScience & Technology in a Wired World
Science & Technology in a Wired World
 
The biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveThe biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspective
 
Digital Curation 101: Preserve
Digital Curation 101: PreserveDigital Curation 101: Preserve
Digital Curation 101: Preserve
 
Opportunities in Data Science.ppt
Opportunities in Data Science.pptOpportunities in Data Science.ppt
Opportunities in Data Science.ppt
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 

More from Rafael C. Jimenez

The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
Rafael C. Jimenez
 

More from Rafael C. Jimenez (20)

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resources
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Standards
StandardsStandards
Standards
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Introduction to the BioJS project
Introduction to the BioJS projectIntroduction to the BioJS project
Introduction to the BioJS project
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 

Recently uploaded (20)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 

Challenges of big data. Summary day 1.

  • 1. E-Infrastructure support for the life sciences: Preparing for the data deluge Rafael Jimenez ELIXIR CTO 16 May, 2014 BioMedBridges Summary Day 1
  • 2. How does it affect data sharing in life sciences?
  • 3. Problems of big data http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552 Compute Compute Compute Storage Compute Transfer Transfer Transfer Transfer Transfer Storage Storage Storage What How Where
  • 4. Knowledge exchange workshop  Discussion of big data challenges in life sciences  Focus on few representative domains  Looking 5 years ahead  Jointly identify potential solutions to our problems Data ICT e-infrastructures LS life sciencesPhysical facilities Scientific information Transfer Computation Storage
  • 5. Data challenges of different science communities
  • 7. Open discussion • Storage problem (cost vs. technology) • Privacy influence storage/sharing • Network protocols
  • 9. Group discussions Group discussion, session 1 (5 groups)  How much data and what type of data?  Who are the stakeholders?  What factors can influence data availability? Group discussion, session 2 (5 groups)  What are the potential bottlenecks per stakeholder?  What are the potential solutions to these bottlenecks?
  • 10.
  • 11. Stakeholders Researchers, Patients, Industry, Local users, Clinical, Academic, Pressure groups, IT in hospitals, Pharma, Agri, Structural Biologist, LS, RI Nodes, Institutions, Algorithm developers, Genomics researchers, Personalized medicine, Funders, TAX payers, EuroBioimaging, Facilities, PDB, Institutes, Commercial data provider, Data repositories Types (Producer, Data resource, Consumer) Production distribution +(Genomics, Clinical, Metabolomics, Proteomics MS, Proteomics ST, Imaging)-. Privacy
  • 12. How much data and types Raw data, Process data, Metadata +(Genomics, Clinical, Proteomics ST, Proteomics MS, Metabolomics, Imaging)-
  • 13. Factors that can influence data availability  scientific (e.g. data reproducibility, uniqueness, value of processed and/or raw data)  financial (cost of data storage, transfer, reproduction)  technical (storage, network, computation…)  political (drivers e.g. from funding bodies/large organisations/national interests)  social (data sharing mentality of the community in question)  legal/ethical/formal (requirements/constraints for data storage/transfer/access - e.g. need to store data on German citizens in Germany; requirements from journal publishers, data management plans, etc.)
  • 14. Bottlenecks  Storage  Data grows faster than data storage (G,P,M,I,C?)  Security restricts how to store/share some of the data (G,C)  Keeping data close  Raw data is not always stored (PST, I, Pms)  Missing repositories (I)  Repositories storing a small part of the data (Pms, M)  Transfer  Data submissions (Pms, G)  Transferring to repositories slower than producing  Downloading (G,P,I)  Just copying data to a HD (Pst)  Same time than producing data  Computation  Preprocessing slower than producing (Pst) Producer, Data resource, Consumer (G)enomics, (C)linical, (P)roteomics (st), (P)roteomics (ms), (M)etabolomics, (I)maging
  • 15. Potential solutions  Storage  Solve problems with technology (e.g. compression)  Evaluate data reproducibility  Network  Faster protocols  Partitioning  Network upgrade  Computation  Clouds  General  Buy services instead of investing in infrastructure Producer, Data resource, Consumer
  • 17. Solutions for big data in other science communities