꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
ELIXIR
1. European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Meeting with Google cloud Platform UK
Rafael C Jimenez
ELIXIR CTO
ELIXIR
2. TOC
• ELIXIR
• Data deluge and the cloud in life sciences
• Cloud use cases
2
3. ELIXIR
• European life sciences research
infrastructure for biological
information to facilitate research
• Safeguard data and build
sustainable data services
• Participated by major bioinformatics
service providers ( > 100) and
supported by 17 EU member states
• Creating a robust infrastructure for
biological information is a bigger
task than any individual
organisation or nation can take on
alone
3
4. Infrastructure for Life Sciences
4
Services & connectors
to drive access and
exploitation
Integration and interoperability
of data and services
Sustain core data
resources
Access, Exchange & Compute
on sensitive data
Compute
Dat
a
Standards
Tools
Training
Professional skills for
managing and exploiting data
Access, Search, Analysis …
Integration, Optimization, Privacy, …
Storage, Network & Computing
Formats, Ontologies, Guidelines, …
Scientific & technical
5. How does it affect data sharing
in life sciences?
6. Large-scale data sharing in the life sciences
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
7. How does big data affect data sharing?
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
Compute Compute
Compute
Storage Compute Transfer
Transfer
Transfer Transfer
Transfer
Storage Storage
Storage
What How Where
11. Data generation vs. data transfer
11
~100 GB
~4 TB
~4 TB
24 hours 1 Gb 100 Mb 10 Mb
~30 min
~9 hour
~9 hour
~5 hours
~4 days
~4 days
~2 days
~5 weeks
~5 weeks
DNA sequencing
Mass spectrometry
Microscopy
Network File Transfer
12. Potential Bottlenecks
in Life Sciences
• Data production grows faster than storage
• Cost of data production technologies declines faster than
storage
• It takes longer to transfer data than produce the data.
13. Data growth
how to reduce the IT budget shortfall?
http://www.eweek.com/
14. Data growth
how to reduce the IT budget shortfall?
http://www.eweek.com/
Optimization
Using technology more effectively
Selecting relevant data
15. Potential solutions
• Storage
• Data compression
• Select what we store
• Evaluate data reproducibility & value of data
• Network
• Faster protocols
• Partitioning
• Network upgrade
• Computation
• Clouds
• Data close to computation
18. How can the data deluge affect data
production?
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
19. Centralization & specialization
19
Data production Data centralization Data
• Data is submitted to specialized centralized repositories.
• Current situation.
20. Federation
20
• If data gets bigger, the data might have to stay where
it is produced.
• We might have to provision data producers with storage
and computation.
• Data might be pulled instead of pushed into centralized
repositories.
20
Data production Data centralization Data
23. How can the data deluge affect data analysis?
http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552
24. Separation of data tools and computation
24
Data
Analysis
tools
Data
Analysis
tools
Data
Analysis
tools
Computation
?
25. Cross-siteVM Operation - pilot
25
• Perform analysis via cloud infrastructures andVMs
• TransferVMs between computing centers to allow researchers to
perform analyses that they could not otherwise do locally
• Supported by 5 NRENs and in collaboration with
26. Cross-siteVM Operation
26
CSC
EMBL-EBI
University of Groningen
Data Analysis
tools
Computation
Data
Analysis
tools
VM
VM
VM
Chipster
200GB
NBIC Galaxy
50GB
GoNL
60TB
ENA
3.2PB 1GB lightpath
1GB lightpath
1GB lightpath
Funet
Janet
SURFnet
28. Use cases
Infrastructure as a Service (IaaS)
Provides on demand access to compute and storage resources.
Platform as a Service (PaaS)
Provide a higher-level environment (other than infrastructure) that is needed
to support data analysis.
Virtual Machine Repository or Marketplace Portal
As a means to distribute or consume software environments targeted at
particular audiences.
28
29. Use cases
Virtual Clusters
Expand local cluster resources by connecting to cloud based virtual clusters
and storage resources.
Running Data Analysis Pipelines
Bring an analysis pipeline to a specified data set.The data set may be on a
shared network file system or database instance visible to the pipeline.
Data Extraction
Allow authorised researchers to deploy aVMI that can return a subset of the
stored data (e.g. data mining) or to undertake local analysis.
29
30. Use cases
30
Scalable Web Service Hosting
Run a single or multi-tier web service (e.g. front-end service and back-end
cluster) on a platform that can scale horizontally while managing the network
configuration (e.g. IP and firewall) and access control.
Shared Environment
Provide an environment for shared use and joint administration that can be
accessed and managed by all in a collaborative manner through a common
software environment.
Virtual Desktops for Immediate Use
To provide a working software environment for teaching, training or research
purposes. These could be a basic operating system or full analysis
environment (e.g. Biolinux).
31. Use cases
Software Development andTesting
For developing and testing software in different operating system
environments.
Appliance
Encapsulates a software product(s) or analysis environment (e.g. part of a
pipeline) in aVMI that is verified to work and ready to run.
31
32. Potential collaboration
• Host processed data like AWS
• Provide a joint solution to large data producers
• App engine, containers, compute, big data for bioinformatics
data analysis
• Facilitate deployment of ELIXIRVMs and containers
• Extension of existing ELIXIR cloud resources
• Replication of large data sets
• Discovery of data sets and tools
• Delegation of IT solutions
• Alliance in life science research
32
33. European Life Sciences Infrastructure for Biological Information
www.elixir-europe.org
Thank you
Editor's Notes
Data resource: Sustainability, availability and integration
'compute power’ doubles every two years. Production of data doubles faster.
Sequencing prices below Moore’s law
Moore’s law predict exponential decline of computing cost
Doubling of 'compute power' every two years
Store data more expensive than produce it
Technology get cheaper and faster
~15.000 hospital
~4.000 universities
~2.000 life sciences research institutes
How much data we will produce? How we will store it?
decline of computing cost
Data resource: Sustainability, availability and integration
Data resource: Sustainability, availability and integration