SlideShare a Scribd company logo
e-Biothon
V. Breton (breton@clermont.in2p3.fr)
LPC Clermont-Ferrand, IdGC
CNRS-IN2P3
http://france-grilles.fr
Credit: N. Bard, A. Franc, JF Gibrat
Extreme Performance Computational Science workshop
Tokyo, April 15th 2014
Table of content
2
• What are the computing challenges of life
sciences?
• France Grilles: a
multidisciplinarydistributede-
infrastructure for science
• E-Biothon: an HPC platform for research in
life sciences
Generalities on sequencing
• Genome = DNA sequence (4 nucleotids:
A, C, G, T)
– Smallest non viral genome:
Carsonellaruddii (0,16Mbp)
– Largestgenome: Polychaosdubium(670Gbp)
Sanger technology 500 bpsequences
454 technology 105reads of 450 to 600bp seq.
Illumina Technology 106 reads of 100 bpseq.
Currentprojects(Tara) 107reads of 100 to 400 bpseq.
Explosion of data set size
Data analysis ?
Algorithms?
Heuristics?
Tara @ http://oceans.taraexpeditions.org/
Evolution of sequencing
techniques
Data production isdistributed
2558 High Throughput « NextGeneration » sequencingfacilities in the world,
located in 920 centers (only 10 with more than 15 machines)
Source: omicspmaps.com
Data production
growsfasterthanMoore’slaw
Sequencing scenarii
• Interest for a new genome requires assembly
– process of taking a large number of short DNA sequences and
putting them back together to create a representation of the
original
– Algorithms based on read overlapping benefit from large RAM (1
TO) -> HPC
• Working with a reference genome requires comparative
analysis
– Alignment algorithms (BLAST) findregions of local
similaritybetweensequences
– Phylogeny algorithms (PhyML) build evolutionary relationships
between genomes
– Comparative analyses are easily parallelized at data level -> HTC
Summary
• Life Sciences have specificcomputational challenges
– Data production growsfasterthan Moore law
– Permanent need of comparing new data to existingones
• Life sciences needscanberelevantlyaddressed on
multidisciplinary IT infrastructures (e-infrastructures)
– HPC resources best fitted for genomeassembly
– Grid/cloud HTC resourceswellfitted for comparative analysis
• Life sciences are among the main users of the French
national grid/cloud production infrastructure
France Grilles
• Is a ScientificInterest Group…
– Created in 2010 by 8 partners: CEA, CNRS,CPU, INRA, INRIA,
INSERM, MESR, RENATER…
– To steer up and coordinate the national strategy in the fields of
grids and clouds
• Vision:
– Build and operate a national distributedcomputing
infrastructure open to all sciences and to developing countries
9
France Grilles model
• France Grilles does not own the resources
– Resourcesowned by user communities
• France Grilles provides a framework
– To shareresources, expertise and know how
– To promote innovation and initiatives
– To foster collaboration at national and international
levels
– To reach out to the long tail of users
10
France Grilles resources
France-Grillesbackbone:
LCG-France
France-Grillesspine:
CC-IN2P3
EGI de 2010 à 2013
12
2010-2013: from 14 regional to 34 operations centres in 53 countries,
from 188,000 jobs/day with 80,000 cores on 250 Resource Centres
to 1,200,000 jobs/day with 430,000 cores on 337 Resource Centres
Technologies
• Grids
• Clouds
• Desktops
Exposé S. Newhouse Madrid, Sept. 2013
France Grilles, a partner of EGI
Provide a commonframework to all user communities
Provide an open environment for fruitfuldisciplinary and
multidisciplinaryresearch
14
5 1 1
218
54
9 1 5 9 11 15 13 11
755
99 50
9
23
1
10
100
1000
Over 1500 scientific publications
june 2010 – April 2014
Web portal
Users
479 registered users in Nov 2013 (175 in France)
Most used robot certificate in EGI (http://go.egi.eu/wiki.robot.users)
Neuro-image analysisCancer therapy simulation
Prostate radiotherapy plan simulated
with GATE(L. Grevillot and D. Sarrut)
Image simulation
Echocardiography simulated with
FIELD-II (O. Bernard et al)
Modeling and optimization of
distributed computing systems
Acceleration yielded by non-clairvoyant
task replication (R. Ferreira da Silva et al)
Brain tissue segmentation
with Freesurfer
Scientific applications
Infrastructure
Supported by EGI Infrastructure
Uses biomed VO (most used EGI VO for life sciences in 2013)
VIP accounts for ~25% of biomed's activity
VIP consumes ~50 CPU years every month
DIRAC
France-Grilles
Application as a service
File transfer to/from grid
Virtual Imaging Platform:
http://www.creatis.insa-lyon.fr/vip
Collaborations withdedicated life sciences infrastructures
• Institut Français de Bioinformatique (computing
and storageresourcesatIDRIS)
• France Genomique ( computing and
storageresourcesat TGCC)
• France Life Imaging (infrastructure for
biomedicalimaging)
• E-Biothon
16
17
• Telethon: everyyear, fundraising by
french media for French
MuscularDistrophy Association (AFM)
• FromTelethon to Decrypthon
– Computing infrastructure (IBM)
– Researchprojects (CNRS)
– Humanresources (AFM)
• FromDecrypthon to E-Biothon
E-Biothon: history
e-Biothon: an HPC platform for
research in life sciences
18
User Support
Blue Gene / p
machines
Technical supportUser Support
Blue Gene / P
operationWeb access
portal
E-Biothon: infrastructure
19
• 2 Blue Gene/P IBM racks
with 200 TO storage
– 2x1024 4-core nodes
– up to 28 TFlopspeak
performance
• SysFera-DS web access
to computingresources
• 2 modes:
– Standard (MPI)
– HTC (1024
independenttasks in
parallel)
E-Biothon vision is to offer a service to
the user communities in life sciences
• 2013-2014: first 3 projects
– Jean-François Gibrat et al, (MIGALE
platform, INRA Jouy-en-Josas)
– Olivier Gascuel, Stéphane Guindon et
Vincent Lefort (CNRS Montpellier)
– Yec’hanLaizet, Philippe
Chaumeil, Jean-Marc
Frigerio, Stéphanie Mariette, Sophie
Gerber, Alain Franc (INRA BioGeCo –
Bordeaux)
• > 2014: open call for projects (IFB)
Studying the synteny over a wide
range of microbialgenomes
21
• Definition: similar blocks of genes in the same relative positions in
the genome
• Interest: Study of syntenycan show how the genomeiscut and pasted
in the course of evolution
• MIGALE team at INRA designed a pipeline analysis to
computesyntenybetween 2 genomes and store it in a database
• E-Biothon impact: change in scale - capacity to
computesyntenybetween 2000 completebacterialgenomes (7
millions comparisons)
PhyML
Philogeneticsis the study of evolutionaryrelationshipsamong groups of
organisms
PhyMLis a software thatestimates maximum
likelihoodphylogeniesfromalignments of nucleotide or
aminoacidsequences
PhyML original publication in 2007 is the mostcited in environment and
ecology (> 6000 citations).
E-Biothon impact: change in scale in the resources made available
to PhyMLusers
Characterizing biodiversity
According to botanictheory,
biodiversityisorganized in
species, genders, families, orders:
isitconfirmed in the distance
betweensequences?
Study of biodiversity in Guyane
16000 differenttreespecies
in amazonianforest (≈ 300
in Europe)
More biodiversity in 10000
m2 of forest in French
Guyana than in Europe
Decrypthonadded value
Change in scale (from local Mesocenter in
Bordeaux)
Millions of reads
Exact distance computation
withoutheuristics (alignement scores)
TOctets of data producedeveryweek
Conclusion
• Both HPC and HTC resources are increasinglyneeded to
address life sciences data and computing challenges:
– As sequencing technologies keepevolving, data production
growsfasterthan Moore law and isincreasinglydistributed
– Biological data need to beconstantlycompared to
eachother (phylogenetics, genomics comparative analysis)
• France isdevelopingcomplementary HPC and HTC
infrastructures for life sciences
– Institut Français de Bioinformatique, France Génomique
– E-Biothon: an HPC platform for research in life sciences
– France Grilles: a multidisciplinarygrid/cloud production
infrastructure
2558 NextGenerationSequencers in the world
Are life sciences
specificw.r.tcomputing?
Whatisspecific to life sciences:
- As sequencing technologies keepevolving, data production growsfasterthan
Moore law
- Biological data need to beconstantlycompared to eachother (phylogenetics,
Genomics comparative analysis)
Whatis not specific?
- Data production isdistributed
- Multiscalemodeling

More Related Content

Viewers also liked

Ученый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиямУченый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиям
uch_sovet_RGPU
 
Copy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokCopy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokainimat
 
Greenway Medical Technologies interview questions and answers
Greenway Medical Technologiesinterview questions and answersGreenway Medical Technologiesinterview questions and answers
Greenway Medical Technologies interview questions and answersnadsavan
 
научные работники 24 июня 2014
научные работники  24 июня 2014 научные работники  24 июня 2014
научные работники 24 июня 2014
uch_sovet_RGPU
 
Business Etiquette Toronto
Business Etiquette TorontoBusiness Etiquette Toronto
Business Etiquette Toronto
Alex Waugh
 
Ideal Learning Environment
Ideal Learning EnvironmentIdeal Learning Environment
Ideal Learning Environment
Patrick O'Conner
 
Overview of power quality problems
Overview of power quality problemsOverview of power quality problems
Overview of power quality problems
Mitesh Karmur
 
Granite City Tool History
Granite City Tool HistoryGranite City Tool History
Granite City Tool History
granitecitytool
 
Social Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppersSocial Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppers
Tinuiti
 
Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014pintailfp7
 
Nscu 302 wk 1 2
Nscu 302 wk 1 2Nscu 302 wk 1 2
Nscu 302 wk 1 2jfazaker
 
Enabling the digital business
Enabling the digital businessEnabling the digital business
Enabling the digital business
Daisy Group
 
Fall leaves fall!
Fall leaves fall!Fall leaves fall!
Fall leaves fall!sherrywyche
 
Page rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commercePage rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commerce
Stefan Duprey
 

Viewers also liked (18)

Ученый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиямУченый совет 22 мая 2014 - Представление к ученым званиям
Ученый совет 22 мая 2014 - Представление к ученым званиям
 
Copy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esokCopy of bbm huruf sama bunyi & sama bentuk esok
Copy of bbm huruf sama bunyi & sama bentuk esok
 
Greenway Medical Technologies interview questions and answers
Greenway Medical Technologiesinterview questions and answersGreenway Medical Technologiesinterview questions and answers
Greenway Medical Technologies interview questions and answers
 
(Group 13) kbat
(Group 13) kbat(Group 13) kbat
(Group 13) kbat
 
научные работники 24 июня 2014
научные работники  24 июня 2014 научные работники  24 июня 2014
научные работники 24 июня 2014
 
(Group 6) pisa
(Group 6) pisa(Group 6) pisa
(Group 6) pisa
 
Business Etiquette Toronto
Business Etiquette TorontoBusiness Etiquette Toronto
Business Etiquette Toronto
 
Ideal Learning Environment
Ideal Learning EnvironmentIdeal Learning Environment
Ideal Learning Environment
 
Overview of power quality problems
Overview of power quality problemsOverview of power quality problems
Overview of power quality problems
 
Granite City Tool History
Granite City Tool HistoryGranite City Tool History
Granite City Tool History
 
Social Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppersSocial Commerce 2.0 With CPC Strategy & AddShoppers
Social Commerce 2.0 With CPC Strategy & AddShoppers
 
Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014Rubicon aal testbed erf workshop rovereto 2014
Rubicon aal testbed erf workshop rovereto 2014
 
Nscu 302 wk 1 2
Nscu 302 wk 1 2Nscu 302 wk 1 2
Nscu 302 wk 1 2
 
Enabling the digital business
Enabling the digital businessEnabling the digital business
Enabling the digital business
 
Fall leaves fall!
Fall leaves fall!Fall leaves fall!
Fall leaves fall!
 
PRUEBA DE SLIDE
PRUEBA DE SLIDEPRUEBA DE SLIDE
PRUEBA DE SLIDE
 
Page rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commercePage rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commerce
 
(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi(Group 7) ppsmi mbmmbi
(Group 7) ppsmi mbmmbi
 

Similar to E biothon workshop 2014 04 15 v1

E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3
Alex Hardisty
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
stratuslab
 
Life watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. BretonLife watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. Breton
Vincent Breton
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
Gianpaolo Coro
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
Gregory Newby
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2Alex Hardisty
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
c.titus.brown
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
ELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR-Europe
 
Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloud
stratuslab
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
OpenAIRE
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
Blue BRIDGE
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniques
Aboul Ella Hassanien
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniques
Aboul Ella Hassanien
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Data Driven Innovation
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
Alex Hardisty
 
BIOMED_presentation.ppt
BIOMED_presentation.pptBIOMED_presentation.ppt
BIOMED_presentation.ppt
AnandKumar459862
 
Science for water management in Mediterranean
Science for water management in MediterraneanScience for water management in Mediterranean
Science for water management in Mediterranean
Agropolis International
 
ELIXIR
ELIXIRELIXIR

Similar to E biothon workshop 2014 04 15 v1 (20)

E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
Life watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. BretonLife watch structural funds workshop 2014 05 12 - V. Breton
Life watch structural funds workshop 2014 05 12 - V. Breton
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
ELIXIR Node Poster France
ELIXIR Node Poster FranceELIXIR Node Poster France
ELIXIR Node Poster France
 
Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloud
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniques
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniques
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
 
BIOMED_presentation.ppt
BIOMED_presentation.pptBIOMED_presentation.ppt
BIOMED_presentation.ppt
 
Science for water management in Mediterranean
Science for water management in MediterraneanScience for water management in Mediterranean
Science for water management in Mediterranean
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

E biothon workshop 2014 04 15 v1

  • 1. e-Biothon V. Breton (breton@clermont.in2p3.fr) LPC Clermont-Ferrand, IdGC CNRS-IN2P3 http://france-grilles.fr Credit: N. Bard, A. Franc, JF Gibrat Extreme Performance Computational Science workshop Tokyo, April 15th 2014
  • 2. Table of content 2 • What are the computing challenges of life sciences? • France Grilles: a multidisciplinarydistributede- infrastructure for science • E-Biothon: an HPC platform for research in life sciences
  • 3. Generalities on sequencing • Genome = DNA sequence (4 nucleotids: A, C, G, T) – Smallest non viral genome: Carsonellaruddii (0,16Mbp) – Largestgenome: Polychaosdubium(670Gbp)
  • 4. Sanger technology 500 bpsequences 454 technology 105reads of 450 to 600bp seq. Illumina Technology 106 reads of 100 bpseq. Currentprojects(Tara) 107reads of 100 to 400 bpseq. Explosion of data set size Data analysis ? Algorithms? Heuristics? Tara @ http://oceans.taraexpeditions.org/ Evolution of sequencing techniques
  • 5. Data production isdistributed 2558 High Throughput « NextGeneration » sequencingfacilities in the world, located in 920 centers (only 10 with more than 15 machines) Source: omicspmaps.com
  • 7. Sequencing scenarii • Interest for a new genome requires assembly – process of taking a large number of short DNA sequences and putting them back together to create a representation of the original – Algorithms based on read overlapping benefit from large RAM (1 TO) -> HPC • Working with a reference genome requires comparative analysis – Alignment algorithms (BLAST) findregions of local similaritybetweensequences – Phylogeny algorithms (PhyML) build evolutionary relationships between genomes – Comparative analyses are easily parallelized at data level -> HTC
  • 8. Summary • Life Sciences have specificcomputational challenges – Data production growsfasterthan Moore law – Permanent need of comparing new data to existingones • Life sciences needscanberelevantlyaddressed on multidisciplinary IT infrastructures (e-infrastructures) – HPC resources best fitted for genomeassembly – Grid/cloud HTC resourceswellfitted for comparative analysis • Life sciences are among the main users of the French national grid/cloud production infrastructure
  • 9. France Grilles • Is a ScientificInterest Group… – Created in 2010 by 8 partners: CEA, CNRS,CPU, INRA, INRIA, INSERM, MESR, RENATER… – To steer up and coordinate the national strategy in the fields of grids and clouds • Vision: – Build and operate a national distributedcomputing infrastructure open to all sciences and to developing countries 9
  • 10. France Grilles model • France Grilles does not own the resources – Resourcesowned by user communities • France Grilles provides a framework – To shareresources, expertise and know how – To promote innovation and initiatives – To foster collaboration at national and international levels – To reach out to the long tail of users 10
  • 12. EGI de 2010 à 2013 12 2010-2013: from 14 regional to 34 operations centres in 53 countries, from 188,000 jobs/day with 80,000 cores on 250 Resource Centres to 1,200,000 jobs/day with 430,000 cores on 337 Resource Centres Technologies • Grids • Clouds • Desktops Exposé S. Newhouse Madrid, Sept. 2013 France Grilles, a partner of EGI
  • 13. Provide a commonframework to all user communities
  • 14. Provide an open environment for fruitfuldisciplinary and multidisciplinaryresearch 14 5 1 1 218 54 9 1 5 9 11 15 13 11 755 99 50 9 23 1 10 100 1000 Over 1500 scientific publications june 2010 – April 2014
  • 15. Web portal Users 479 registered users in Nov 2013 (175 in France) Most used robot certificate in EGI (http://go.egi.eu/wiki.robot.users) Neuro-image analysisCancer therapy simulation Prostate radiotherapy plan simulated with GATE(L. Grevillot and D. Sarrut) Image simulation Echocardiography simulated with FIELD-II (O. Bernard et al) Modeling and optimization of distributed computing systems Acceleration yielded by non-clairvoyant task replication (R. Ferreira da Silva et al) Brain tissue segmentation with Freesurfer Scientific applications Infrastructure Supported by EGI Infrastructure Uses biomed VO (most used EGI VO for life sciences in 2013) VIP accounts for ~25% of biomed's activity VIP consumes ~50 CPU years every month DIRAC France-Grilles Application as a service File transfer to/from grid Virtual Imaging Platform: http://www.creatis.insa-lyon.fr/vip
  • 16. Collaborations withdedicated life sciences infrastructures • Institut Français de Bioinformatique (computing and storageresourcesatIDRIS) • France Genomique ( computing and storageresourcesat TGCC) • France Life Imaging (infrastructure for biomedicalimaging) • E-Biothon 16
  • 17. 17 • Telethon: everyyear, fundraising by french media for French MuscularDistrophy Association (AFM) • FromTelethon to Decrypthon – Computing infrastructure (IBM) – Researchprojects (CNRS) – Humanresources (AFM) • FromDecrypthon to E-Biothon E-Biothon: history
  • 18. e-Biothon: an HPC platform for research in life sciences 18 User Support Blue Gene / p machines Technical supportUser Support Blue Gene / P operationWeb access portal
  • 19. E-Biothon: infrastructure 19 • 2 Blue Gene/P IBM racks with 200 TO storage – 2x1024 4-core nodes – up to 28 TFlopspeak performance • SysFera-DS web access to computingresources • 2 modes: – Standard (MPI) – HTC (1024 independenttasks in parallel)
  • 20. E-Biothon vision is to offer a service to the user communities in life sciences • 2013-2014: first 3 projects – Jean-François Gibrat et al, (MIGALE platform, INRA Jouy-en-Josas) – Olivier Gascuel, Stéphane Guindon et Vincent Lefort (CNRS Montpellier) – Yec’hanLaizet, Philippe Chaumeil, Jean-Marc Frigerio, Stéphanie Mariette, Sophie Gerber, Alain Franc (INRA BioGeCo – Bordeaux) • > 2014: open call for projects (IFB)
  • 21. Studying the synteny over a wide range of microbialgenomes 21 • Definition: similar blocks of genes in the same relative positions in the genome • Interest: Study of syntenycan show how the genomeiscut and pasted in the course of evolution • MIGALE team at INRA designed a pipeline analysis to computesyntenybetween 2 genomes and store it in a database • E-Biothon impact: change in scale - capacity to computesyntenybetween 2000 completebacterialgenomes (7 millions comparisons)
  • 22. PhyML Philogeneticsis the study of evolutionaryrelationshipsamong groups of organisms PhyMLis a software thatestimates maximum likelihoodphylogeniesfromalignments of nucleotide or aminoacidsequences PhyML original publication in 2007 is the mostcited in environment and ecology (> 6000 citations). E-Biothon impact: change in scale in the resources made available to PhyMLusers
  • 24. According to botanictheory, biodiversityisorganized in species, genders, families, orders: isitconfirmed in the distance betweensequences?
  • 25. Study of biodiversity in Guyane 16000 differenttreespecies in amazonianforest (≈ 300 in Europe) More biodiversity in 10000 m2 of forest in French Guyana than in Europe Decrypthonadded value Change in scale (from local Mesocenter in Bordeaux) Millions of reads Exact distance computation withoutheuristics (alignement scores) TOctets of data producedeveryweek
  • 26. Conclusion • Both HPC and HTC resources are increasinglyneeded to address life sciences data and computing challenges: – As sequencing technologies keepevolving, data production growsfasterthan Moore law and isincreasinglydistributed – Biological data need to beconstantlycompared to eachother (phylogenetics, genomics comparative analysis) • France isdevelopingcomplementary HPC and HTC infrastructures for life sciences – Institut Français de Bioinformatique, France Génomique – E-Biothon: an HPC platform for research in life sciences – France Grilles: a multidisciplinarygrid/cloud production infrastructure
  • 27.
  • 28.
  • 30. Are life sciences specificw.r.tcomputing? Whatisspecific to life sciences: - As sequencing technologies keepevolving, data production growsfasterthan Moore law - Biological data need to beconstantlycompared to eachother (phylogenetics, Genomics comparative analysis) Whatis not specific? - Data production isdistributed - Multiscalemodeling