SlideShare a Scribd company logo
1 of 40
Download to read offline
Gestion des données scientifiques en imagerie in vivo
https://piv2017.sciencesconf.org/
Christophe CERIN, Université Paris13
christophe.cerin@lipn.univ-paris13.fr
Plateformes de resources numériques
Expérience USPC
Paris, December 7, 2017
Outline
2
• Motivations related to large scale platforms
• Background:
• Criteria for observing platforms
• High Performance Computing (HPC)
• Apache suite for Big Data
• Some projects
• Contribution: USPC platform
• Conclusion and perspective
Motivations related to large
scale platforms
In experimental Sciences
we need tips for using
them appropriately / best
practices
3
4
Definition of a large scale system
• System with unprecedented amounts of hardware (>1M cores,
>100 PB storage, >100Gbits/s interconnect);
• Ultra large scale system: hardware + lines of source code +
numbers of users + volumes of data;
• New problems:
• built across multiple organizations;
• often with conflicting purposes and needs;
• constructed from heterogeneous parts with complex
dependencies and emergent properties;
• continuously evolving;
• software, hardware and human failures will be the norm,
not the exception.
Since in our lives (Cloud Computing)
5
Software as a Service
SaaS
Plateform as a Service
PaaS

Infrastructure as a 

Service

IaaS
• Don’t believe them;
• Variety in the eco-system is the norm.
6
Some speeches are ready for you
In Cloud, innovation drives technology/science
Containers and no more Virtual Machines
7
Hardware
Operating System
App App App
Conventional Stack
Hardware
OS
App App App
Hypervisor
OS OS
Virtualized Stack
▪Abstraction on resources in the past: VM technologies to hide
physical properties, interactions between system layer,
application layer and the end-users
VM versus Container
8
9
In Sciences (not in business)
• Experimental approach: modeling,…,
experiments,…
• What is an experimental plan for large
scale systems and reproducible research?
• Answers depend on the system used:
• Cluster (dedicated; managed)
• Cloud (almost non supervised… or by
you)
Background
10
▪Cluster ➔Performance ➔Flops - Top500
▪Cluster ➔Power ➔Flops per watt - Green500
▪Cloud ➔Price ➔for computing, for storing…
▪Cloud: the economic model may be strange (Amazon
Spot Instances)
▪Scientific issues: how to compare cluster and cloud?
(be fair)
11
Performance criteria for observing
large scale system
12
Performance (Top500)
13
Performance (Green500)
14
Productivity
▪Productivity traditionally refers to the ratio between the quantity
of software produced and the cost spent for it.
▪Factors: Programming language used, Program size, The experience
of programmers and design personnel, The novelty of requirements,
The complexity of the program and its data, The use of structured
programming methods, Program class or the distribution method,
Program type of the application area, Tools and environmental
conditions, Enhancing existing programs or systems, Maintaining
existing programs or systems, Reusing existing modules and standard
designs, Program generators, Fourth-generation languages,
Geographic separation of development locations, Defect potentials
and removal methods, (Existing) Documentation, Prototyping before
main development begins, Project teams and organization
structures, Morale and compensation of staff
▪Parallel (http://press3.mcs.anl.gov/atpesc/files/2015/03/ESCTP-snir_215.pdf):
▪Shared memory (OpenMP):
▪CPU+vector
▪CPU+GPU
▪CPU+accelerators (Google Tensor processing unit…)
▪Message passing (MPI)
▪Sequential with the call to « parallel data structure » (Scala)
▪Map-reduce: Hadoop, Spark (In addition to Map and Reduce operations, it
supports SQL queries, streaming data, machine learning and graph data
processing.)
15
Programming model
▪Even more complicated: Cray has added "ARM Option" (i.e. CPU blade option,
using the ThunderX2) to their XC50 supercomputers
▪We need consolidations:
▪in hardware
▪in software
▪in algorithms
▪in applications
▪in experimental methods
▪hpc facilities
▪training
16
Hardware landscape
17
Apache suite for Big Data
1- Bases de Données
1.1 Apache Hbase
1.2 Apache Cassandra
1.3 CouchDB
1.4 MongoDB
2 Accès aux données/ requêtage
2.1 Pig
2.2 Hive
2.3 Livy
3 Data Intelligence
3.1 Apache Drill
3.2 Apache Mahout
3.3 H2O
4 Data Serialisation
4.1 Apache Thrift
4.2 Apache Avro
5 Data integration
5.1 Apache Sqoop
5.2 Apache Flume
5.3 Apache Chuckwa
6 Requetage
6.1 Presto
6.2 Impala
6.3 Dremel
7 Sécurité des données
7.1 Apache Metron
7.2 Sqrrl
8.2 MapReduce
8.3 Spark
9.1 Elasticsearch
10.1 Apache Hadoop
10.6 Apache Mesos
10.10 Apache Flink
10.12 Apache Zookeeper
10.15 Apache Storm
10.20 Apache Kafka
Example of a project mixing
HPC and Big Data considerations
Where theory comes before data:
physics
chemistry
Materials
...
Where data comes before theory:
Medicine
business insights
autonomy (smart car, building, city)
18
Harp: Hadoop and Collective
Communication for Iterative Computation
19
• Project http://salsaproj.indiana.edu/harp/
• Motivation:
• Communication patterns are not abstracted and
defined in Hadoop, Pregel/Giraph, Spark
• In contrary, MPI has very fine grain based
(collective) communication primitives (based on
arrays and buffers)
• Harp provides data abstractions and communication
abstractions on top of them. It can be plugged into
Hadoop runtime to enable efficient in-memory
communication to avoid HDFS read/write.
20
Harp
• The data
abstraction
has 3
categories
horizontally
and 3 levels
vertically
• Harp works as a plugin in Hadoop. The goal is to
make Hadoop cluster can schedule original
MapReduce jobs and Map-Collective jobs at the
same time.
• Collective communication is defined as
movement of partitions within tables.
• Collective communication requires
synchronization. Hadoop scheduler is modified
to schedule tasks in BSP style.
• Fault tolerance with checkpointing.
21
Harp
• For the data-scientist of nowadays;
• Hub to facilite access to Amazon cloud
resources, to share data, to collaborate
between many people through notebooks;
• RosettaHUB: https://www.rosettahub.com
• Available through Amazon Educate program
(https://s3.amazonaws.com/awseducate-
list/AWS_Educate_Institutions.pdf)
22
RosettaHub
23
24
25
26
SPC and the deployment of
virtual machines
IDV experience
(Imageries du vivant)
27
28
http://cirrus.uspc.fr
29
30
31
Technologie OpenNebula
32
33
34
35
36
• Plateforme d'imagerie ePad
http://pf-01.lab.parisdescartes.fr:1243/dcm4chee-web3/
• Plateforme PLM dans le cadre du projet DRIVE-SPC
http://pf-01.lab.parisdescartes.fr:2345/tc/webclient
• Plateforme Sis4web de SisNCom
http://pf-01.lab.parisdescartes.fr:1323/sis4web/login.php
• Benchsys
http://pf-01.lab.parisdescartes.fr:9001/
• Owncloud
http://pf-01.lab.parisdescartes.fr:1253/owncloud/index.php/login
• …
37
Examples IDV services
Conclusion and Future work
Many, many opportunities
38
• Computer hardware will continue to evolve; Trend:
accelerators instead of GPUs;
• Platforms will continue to evolve to facilitate the
management of data, the computation on data;
• Need for convergence/resonance between HPC and Big-
data eco-systems;
• SPC: needs for training, using, transforming the scientific
approaches… with research engineers for accompany the
change;
• SPC: needs for projects… and to share betweenez
communities.
39
Conclusion and perspective
Thanks to Leila Abidi, Olivier
Waldek, Roland Chervet
40

More Related Content

What's hot

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 

What's hot (20)

51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystem
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
Hadoop
Hadoop Hadoop
Hadoop
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 

Similar to C cerin piv2017_c

Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
MLconf
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
inside-BigData.com
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 

Similar to C cerin piv2017_c (20)

Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
FC Brochure & Insert
FC Brochure & InsertFC Brochure & Insert
FC Brochure & Insert
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 

More from Bertrand Tavitian

More from Bertrand Tavitian (7)

14 30-mjoliot biomist-piv-2017
14 30-mjoliot biomist-piv-201714 30-mjoliot biomist-piv-2017
14 30-mjoliot biomist-piv-2017
 
9 30 fandre-dist_cnrs_piv_2017
9 30 fandre-dist_cnrs_piv_20179 30 fandre-dist_cnrs_piv_2017
9 30 fandre-dist_cnrs_piv_2017
 
M allanic piv2017_c
M allanic piv2017_cM allanic piv2017_c
M allanic piv2017_c
 
M joliot biomist-piv_2017_c
M joliot biomist-piv_2017_cM joliot biomist-piv_2017_c
M joliot biomist-piv_2017_c
 
14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c
 
M kain fli-iam_piv_2017_c
M kain fli-iam_piv_2017_cM kain fli-iam_piv_2017_c
M kain fli-iam_piv_2017_c
 
Mc jacquemot piv2017_c
Mc jacquemot piv2017_cMc jacquemot piv2017_c
Mc jacquemot piv2017_c
 

Recently uploaded

CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
ocean4396
 

Recently uploaded (20)

Premium ℂall Girls In Mumbai👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa VIP ℂall...
Premium ℂall Girls In Mumbai👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa VIP ℂall...Premium ℂall Girls In Mumbai👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa VIP ℂall...
Premium ℂall Girls In Mumbai👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa VIP ℂall...
 
Signs It’s Time for Physiotherapy Sessions Prioritizing Wellness
Signs It’s Time for Physiotherapy Sessions Prioritizing WellnessSigns It’s Time for Physiotherapy Sessions Prioritizing Wellness
Signs It’s Time for Physiotherapy Sessions Prioritizing Wellness
 
Get the best psychology treatment in Indore at Gokuldas Hospital
Get the best psychology treatment in Indore at Gokuldas HospitalGet the best psychology treatment in Indore at Gokuldas Hospital
Get the best psychology treatment in Indore at Gokuldas Hospital
 
Denture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of actionDenture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of action
 
Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...
Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...
Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...
 
The Clean Living Project Episode 24 - Subconscious
The Clean Living Project Episode 24 - SubconsciousThe Clean Living Project Episode 24 - Subconscious
The Clean Living Project Episode 24 - Subconscious
 
Unveiling Alcohol Withdrawal Syndrome: exploring it's hidden depths
Unveiling Alcohol Withdrawal Syndrome: exploring it's hidden depthsUnveiling Alcohol Withdrawal Syndrome: exploring it's hidden depths
Unveiling Alcohol Withdrawal Syndrome: exploring it's hidden depths
 
Unlocking Holistic Wellness: Addressing Depression, Mental Well-Being, and St...
Unlocking Holistic Wellness: Addressing Depression, Mental Well-Being, and St...Unlocking Holistic Wellness: Addressing Depression, Mental Well-Being, and St...
Unlocking Holistic Wellness: Addressing Depression, Mental Well-Being, and St...
 
Renal Replacement Therapy in Acute Kidney Injury -time modality -Dr Ayman Se...
Renal Replacement Therapy in Acute Kidney Injury -time  modality -Dr Ayman Se...Renal Replacement Therapy in Acute Kidney Injury -time  modality -Dr Ayman Se...
Renal Replacement Therapy in Acute Kidney Injury -time modality -Dr Ayman Se...
 
PYODERMA, IMPETIGO, FOLLICULITIS, FURUNCLES, CARBUNCLES.pdf
PYODERMA, IMPETIGO, FOLLICULITIS, FURUNCLES, CARBUNCLES.pdfPYODERMA, IMPETIGO, FOLLICULITIS, FURUNCLES, CARBUNCLES.pdf
PYODERMA, IMPETIGO, FOLLICULITIS, FURUNCLES, CARBUNCLES.pdf
 
The Orbit & its contents by Dr. Rabia I. Gandapore.pptx
The Orbit & its contents by Dr. Rabia I. Gandapore.pptxThe Orbit & its contents by Dr. Rabia I. Gandapore.pptx
The Orbit & its contents by Dr. Rabia I. Gandapore.pptx
 
Integrated Neuromuscular Inhibition Technique (INIT)
Integrated Neuromuscular Inhibition Technique (INIT)Integrated Neuromuscular Inhibition Technique (INIT)
Integrated Neuromuscular Inhibition Technique (INIT)
 
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
 
Varicose Veins Treatment Aftercare Tips by Gokuldas Hospital
Varicose Veins Treatment Aftercare Tips by Gokuldas HospitalVaricose Veins Treatment Aftercare Tips by Gokuldas Hospital
Varicose Veins Treatment Aftercare Tips by Gokuldas Hospital
 
VIP Pune 7877925207 WhatsApp: Me All Time Serviℂe Available Day and Night
VIP Pune 7877925207 WhatsApp: Me All Time Serviℂe Available Day and NightVIP Pune 7877925207 WhatsApp: Me All Time Serviℂe Available Day and Night
VIP Pune 7877925207 WhatsApp: Me All Time Serviℂe Available Day and Night
 
TEST BANK For Lewis's Medical Surgical Nursing in Canada, 4th Edition by Jane...
TEST BANK For Lewis's Medical Surgical Nursing in Canada, 4th Edition by Jane...TEST BANK For Lewis's Medical Surgical Nursing in Canada, 4th Edition by Jane...
TEST BANK For Lewis's Medical Surgical Nursing in Canada, 4th Edition by Jane...
 
DR. Neha Mehta Best Psychologist.in India
DR. Neha Mehta Best Psychologist.in IndiaDR. Neha Mehta Best Psychologist.in India
DR. Neha Mehta Best Psychologist.in India
 
HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...
HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...
HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...
 
Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?
Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?
Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?
 
Tips to Choose the Best Psychiatrists in Indore
Tips to Choose the Best Psychiatrists in IndoreTips to Choose the Best Psychiatrists in Indore
Tips to Choose the Best Psychiatrists in Indore
 

C cerin piv2017_c

  • 1. Gestion des données scientifiques en imagerie in vivo https://piv2017.sciencesconf.org/ Christophe CERIN, Université Paris13 christophe.cerin@lipn.univ-paris13.fr Plateformes de resources numériques Expérience USPC Paris, December 7, 2017
  • 2. Outline 2 • Motivations related to large scale platforms • Background: • Criteria for observing platforms • High Performance Computing (HPC) • Apache suite for Big Data • Some projects • Contribution: USPC platform • Conclusion and perspective
  • 3. Motivations related to large scale platforms In experimental Sciences we need tips for using them appropriately / best practices 3
  • 4. 4 Definition of a large scale system • System with unprecedented amounts of hardware (>1M cores, >100 PB storage, >100Gbits/s interconnect); • Ultra large scale system: hardware + lines of source code + numbers of users + volumes of data; • New problems: • built across multiple organizations; • often with conflicting purposes and needs; • constructed from heterogeneous parts with complex dependencies and emergent properties; • continuously evolving; • software, hardware and human failures will be the norm, not the exception.
  • 5. Since in our lives (Cloud Computing) 5 Software as a Service SaaS Plateform as a Service PaaS
 Infrastructure as a 
 Service
 IaaS
  • 6. • Don’t believe them; • Variety in the eco-system is the norm. 6 Some speeches are ready for you
  • 7. In Cloud, innovation drives technology/science Containers and no more Virtual Machines 7 Hardware Operating System App App App Conventional Stack Hardware OS App App App Hypervisor OS OS Virtualized Stack ▪Abstraction on resources in the past: VM technologies to hide physical properties, interactions between system layer, application layer and the end-users
  • 9. 9 In Sciences (not in business) • Experimental approach: modeling,…, experiments,… • What is an experimental plan for large scale systems and reproducible research? • Answers depend on the system used: • Cluster (dedicated; managed) • Cloud (almost non supervised… or by you)
  • 11. ▪Cluster ➔Performance ➔Flops - Top500 ▪Cluster ➔Power ➔Flops per watt - Green500 ▪Cloud ➔Price ➔for computing, for storing… ▪Cloud: the economic model may be strange (Amazon Spot Instances) ▪Scientific issues: how to compare cluster and cloud? (be fair) 11 Performance criteria for observing large scale system
  • 14. 14 Productivity ▪Productivity traditionally refers to the ratio between the quantity of software produced and the cost spent for it. ▪Factors: Programming language used, Program size, The experience of programmers and design personnel, The novelty of requirements, The complexity of the program and its data, The use of structured programming methods, Program class or the distribution method, Program type of the application area, Tools and environmental conditions, Enhancing existing programs or systems, Maintaining existing programs or systems, Reusing existing modules and standard designs, Program generators, Fourth-generation languages, Geographic separation of development locations, Defect potentials and removal methods, (Existing) Documentation, Prototyping before main development begins, Project teams and organization structures, Morale and compensation of staff
  • 15. ▪Parallel (http://press3.mcs.anl.gov/atpesc/files/2015/03/ESCTP-snir_215.pdf): ▪Shared memory (OpenMP): ▪CPU+vector ▪CPU+GPU ▪CPU+accelerators (Google Tensor processing unit…) ▪Message passing (MPI) ▪Sequential with the call to « parallel data structure » (Scala) ▪Map-reduce: Hadoop, Spark (In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing.) 15 Programming model
  • 16. ▪Even more complicated: Cray has added "ARM Option" (i.e. CPU blade option, using the ThunderX2) to their XC50 supercomputers ▪We need consolidations: ▪in hardware ▪in software ▪in algorithms ▪in applications ▪in experimental methods ▪hpc facilities ▪training 16 Hardware landscape
  • 17. 17 Apache suite for Big Data 1- Bases de Données 1.1 Apache Hbase 1.2 Apache Cassandra 1.3 CouchDB 1.4 MongoDB 2 Accès aux données/ requêtage 2.1 Pig 2.2 Hive 2.3 Livy 3 Data Intelligence 3.1 Apache Drill 3.2 Apache Mahout 3.3 H2O 4 Data Serialisation 4.1 Apache Thrift 4.2 Apache Avro 5 Data integration 5.1 Apache Sqoop 5.2 Apache Flume 5.3 Apache Chuckwa 6 Requetage 6.1 Presto 6.2 Impala 6.3 Dremel 7 Sécurité des données 7.1 Apache Metron 7.2 Sqrrl 8.2 MapReduce 8.3 Spark 9.1 Elasticsearch 10.1 Apache Hadoop 10.6 Apache Mesos 10.10 Apache Flink 10.12 Apache Zookeeper 10.15 Apache Storm 10.20 Apache Kafka
  • 18. Example of a project mixing HPC and Big Data considerations Where theory comes before data: physics chemistry Materials ... Where data comes before theory: Medicine business insights autonomy (smart car, building, city) 18
  • 19. Harp: Hadoop and Collective Communication for Iterative Computation 19 • Project http://salsaproj.indiana.edu/harp/ • Motivation: • Communication patterns are not abstracted and defined in Hadoop, Pregel/Giraph, Spark • In contrary, MPI has very fine grain based (collective) communication primitives (based on arrays and buffers) • Harp provides data abstractions and communication abstractions on top of them. It can be plugged into Hadoop runtime to enable efficient in-memory communication to avoid HDFS read/write.
  • 20. 20 Harp • The data abstraction has 3 categories horizontally and 3 levels vertically
  • 21. • Harp works as a plugin in Hadoop. The goal is to make Hadoop cluster can schedule original MapReduce jobs and Map-Collective jobs at the same time. • Collective communication is defined as movement of partitions within tables. • Collective communication requires synchronization. Hadoop scheduler is modified to schedule tasks in BSP style. • Fault tolerance with checkpointing. 21 Harp
  • 22. • For the data-scientist of nowadays; • Hub to facilite access to Amazon cloud resources, to share data, to collaborate between many people through notebooks; • RosettaHUB: https://www.rosettahub.com • Available through Amazon Educate program (https://s3.amazonaws.com/awseducate- list/AWS_Educate_Institutions.pdf) 22 RosettaHub
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. SPC and the deployment of virtual machines IDV experience (Imageries du vivant) 27
  • 29. 29
  • 30. 30
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. • Plateforme d'imagerie ePad http://pf-01.lab.parisdescartes.fr:1243/dcm4chee-web3/ • Plateforme PLM dans le cadre du projet DRIVE-SPC http://pf-01.lab.parisdescartes.fr:2345/tc/webclient • Plateforme Sis4web de SisNCom http://pf-01.lab.parisdescartes.fr:1323/sis4web/login.php • Benchsys http://pf-01.lab.parisdescartes.fr:9001/ • Owncloud http://pf-01.lab.parisdescartes.fr:1253/owncloud/index.php/login • … 37 Examples IDV services
  • 38. Conclusion and Future work Many, many opportunities 38
  • 39. • Computer hardware will continue to evolve; Trend: accelerators instead of GPUs; • Platforms will continue to evolve to facilitate the management of data, the computation on data; • Need for convergence/resonance between HPC and Big- data eco-systems; • SPC: needs for training, using, transforming the scientific approaches… with research engineers for accompany the change; • SPC: needs for projects… and to share betweenez communities. 39 Conclusion and perspective
  • 40. Thanks to Leila Abidi, Olivier Waldek, Roland Chervet 40