SlideShare a Scribd company logo
Gestion des données scientifiques en imagerie in vivo
https://piv2017.sciencesconf.org/
Christophe CERIN, Université Paris13
christophe.cerin@lipn.univ-paris13.fr
Plateformes de resources numériques
Expérience USPC
Paris, December 7, 2017
Outline
2
• Motivations related to large scale platforms
• Background:
• Criteria for observing platforms
• High Performance Computing (HPC)
• Apache suite for Big Data
• Some projects
• Contribution: USPC platform
• Conclusion and perspective
Motivations related to large
scale platforms
In experimental Sciences
we need tips for using
them appropriately / best
practices
3
4
Definition of a large scale system
• System with unprecedented amounts of hardware (>1M cores,
>100 PB storage, >100Gbits/s interconnect);
• Ultra large scale system: hardware + lines of source code +
numbers of users + volumes of data;
• New problems:
• built across multiple organizations;
• often with conflicting purposes and needs;
• constructed from heterogeneous parts with complex
dependencies and emergent properties;
• continuously evolving;
• software, hardware and human failures will be the norm,
not the exception.
Since in our lives (Cloud Computing)
5
Software as a Service
SaaS
Plateform as a Service
PaaS

Infrastructure as a 

Service

IaaS
• Don’t believe them;
• Variety in the eco-system is the norm.
6
Some speeches are ready for you
In Cloud, innovation drives technology/science
Containers and no more Virtual Machines
7
Hardware
Operating System
App App App
Conventional Stack
Hardware
OS
App App App
Hypervisor
OS OS
Virtualized Stack
▪Abstraction on resources in the past: VM technologies to hide
physical properties, interactions between system layer,
application layer and the end-users
VM versus Container
8
9
In Sciences (not in business)
• Experimental approach: modeling,…,
experiments,…
• What is an experimental plan for large
scale systems and reproducible research?
• Answers depend on the system used:
• Cluster (dedicated; managed)
• Cloud (almost non supervised… or by
you)
Background
10
▪Cluster ➔Performance ➔Flops - Top500
▪Cluster ➔Power ➔Flops per watt - Green500
▪Cloud ➔Price ➔for computing, for storing…
▪Cloud: the economic model may be strange (Amazon
Spot Instances)
▪Scientific issues: how to compare cluster and cloud?
(be fair)
11
Performance criteria for observing
large scale system
12
Performance (Top500)
13
Performance (Green500)
14
Productivity
▪Productivity traditionally refers to the ratio between the quantity
of software produced and the cost spent for it.
▪Factors: Programming language used, Program size, The experience
of programmers and design personnel, The novelty of requirements,
The complexity of the program and its data, The use of structured
programming methods, Program class or the distribution method,
Program type of the application area, Tools and environmental
conditions, Enhancing existing programs or systems, Maintaining
existing programs or systems, Reusing existing modules and standard
designs, Program generators, Fourth-generation languages,
Geographic separation of development locations, Defect potentials
and removal methods, (Existing) Documentation, Prototyping before
main development begins, Project teams and organization
structures, Morale and compensation of staff
▪Parallel (http://press3.mcs.anl.gov/atpesc/files/2015/03/ESCTP-snir_215.pdf):
▪Shared memory (OpenMP):
▪CPU+vector
▪CPU+GPU
▪CPU+accelerators (Google Tensor processing unit…)
▪Message passing (MPI)
▪Sequential with the call to « parallel data structure » (Scala)
▪Map-reduce: Hadoop, Spark (In addition to Map and Reduce operations, it
supports SQL queries, streaming data, machine learning and graph data
processing.)
15
Programming model
▪Even more complicated: Cray has added "ARM Option" (i.e. CPU blade option,
using the ThunderX2) to their XC50 supercomputers
▪We need consolidations:
▪in hardware
▪in software
▪in algorithms
▪in applications
▪in experimental methods
▪hpc facilities
▪training
16
Hardware landscape
17
Apache suite for Big Data
1- Bases de Données
1.1 Apache Hbase
1.2 Apache Cassandra
1.3 CouchDB
1.4 MongoDB
2 Accès aux données/ requêtage
2.1 Pig
2.2 Hive
2.3 Livy
3 Data Intelligence
3.1 Apache Drill
3.2 Apache Mahout
3.3 H2O
4 Data Serialisation
4.1 Apache Thrift
4.2 Apache Avro
5 Data integration
5.1 Apache Sqoop
5.2 Apache Flume
5.3 Apache Chuckwa
6 Requetage
6.1 Presto
6.2 Impala
6.3 Dremel
7 Sécurité des données
7.1 Apache Metron
7.2 Sqrrl
8.2 MapReduce
8.3 Spark
9.1 Elasticsearch
10.1 Apache Hadoop
10.6 Apache Mesos
10.10 Apache Flink
10.12 Apache Zookeeper
10.15 Apache Storm
10.20 Apache Kafka
Example of a project mixing
HPC and Big Data considerations
Where theory comes before data:
physics
chemistry
Materials
...
Where data comes before theory:
Medicine
business insights
autonomy (smart car, building, city)
18
Harp: Hadoop and Collective
Communication for Iterative Computation
19
• Project http://salsaproj.indiana.edu/harp/
• Motivation:
• Communication patterns are not abstracted and
defined in Hadoop, Pregel/Giraph, Spark
• In contrary, MPI has very fine grain based
(collective) communication primitives (based on
arrays and buffers)
• Harp provides data abstractions and communication
abstractions on top of them. It can be plugged into
Hadoop runtime to enable efficient in-memory
communication to avoid HDFS read/write.
20
Harp
• The data
abstraction
has 3
categories
horizontally
and 3 levels
vertically
• Harp works as a plugin in Hadoop. The goal is to
make Hadoop cluster can schedule original
MapReduce jobs and Map-Collective jobs at the
same time.
• Collective communication is defined as
movement of partitions within tables.
• Collective communication requires
synchronization. Hadoop scheduler is modified
to schedule tasks in BSP style.
• Fault tolerance with checkpointing.
21
Harp
• For the data-scientist of nowadays;
• Hub to facilite access to Amazon cloud
resources, to share data, to collaborate
between many people through notebooks;
• RosettaHUB: https://www.rosettahub.com
• Available through Amazon Educate program
(https://s3.amazonaws.com/awseducate-
list/AWS_Educate_Institutions.pdf)
22
RosettaHub
23
24
25
26
SPC and the deployment of
virtual machines
IDV experience
(Imageries du vivant)
27
28
http://cirrus.uspc.fr
29
30
31
Technologie OpenNebula
32
33
34
35
36
• Plateforme d'imagerie ePad
http://pf-01.lab.parisdescartes.fr:1243/dcm4chee-web3/
• Plateforme PLM dans le cadre du projet DRIVE-SPC
http://pf-01.lab.parisdescartes.fr:2345/tc/webclient
• Plateforme Sis4web de SisNCom
http://pf-01.lab.parisdescartes.fr:1323/sis4web/login.php
• Benchsys
http://pf-01.lab.parisdescartes.fr:9001/
• Owncloud
http://pf-01.lab.parisdescartes.fr:1253/owncloud/index.php/login
• …
37
Examples IDV services
Conclusion and Future work
Many, many opportunities
38
• Computer hardware will continue to evolve; Trend:
accelerators instead of GPUs;
• Platforms will continue to evolve to facilitate the
management of data, the computation on data;
• Need for convergence/resonance between HPC and Big-
data eco-systems;
• SPC: needs for training, using, transforming the scientific
approaches… with research engineers for accompany the
change;
• SPC: needs for projects… and to share betweenez
communities.
39
Conclusion and perspective
Thanks to Leila Abidi, Olivier
Waldek, Roland Chervet
40

More Related Content

What's hot

51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
Geoffrey Fox
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystem
Geert Van Landeghem
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
Geoffrey Fox
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Geoffrey Fox
 
Hadoop
Hadoop Hadoop
Hadoop
Manuel Vargas
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Rahul Singh
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Geoffrey Fox
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
Geoffrey Fox
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Chanchal Tripathi
 
Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013
Vijay Srinivas Agneeswaran, Ph.D
 
Hadoop
HadoopHadoop
Hadoop
RittikaBaksi
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
Edureka!
 

What's hot (20)

51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Introducing the hadoop ecosystem
Introducing the hadoop ecosystemIntroducing the hadoop ecosystem
Introducing the hadoop ecosystem
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
Hadoop
Hadoop Hadoop
Hadoop
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 

Similar to C cerin piv2017_c

Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
Geoffrey Fox
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
MLconf
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
Sathish24111
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
Aamir Ameen
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
inside-BigData.com
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
Sriram Krishnan
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
FC Brochure & Insert
FC Brochure & InsertFC Brochure & Insert
FC Brochure & Insert
Michael Atkins
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
Jinseob Kim
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
Raminder Singh
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
inside-BigData.com
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Renato Bonomini
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
IJSRED
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
Rebekah Rodriguez
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 

Similar to C cerin piv2017_c (20)

Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
FC Brochure & Insert
FC Brochure & InsertFC Brochure & Insert
FC Brochure & Insert
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 

More from Bertrand Tavitian

14 30-mjoliot biomist-piv-2017
14 30-mjoliot biomist-piv-201714 30-mjoliot biomist-piv-2017
14 30-mjoliot biomist-piv-2017
Bertrand Tavitian
 
9 30 fandre-dist_cnrs_piv_2017
9 30 fandre-dist_cnrs_piv_20179 30 fandre-dist_cnrs_piv_2017
9 30 fandre-dist_cnrs_piv_2017
Bertrand Tavitian
 
M allanic piv2017_c
M allanic piv2017_cM allanic piv2017_c
M allanic piv2017_c
Bertrand Tavitian
 
M joliot biomist-piv_2017_c
M joliot biomist-piv_2017_cM joliot biomist-piv_2017_c
M joliot biomist-piv_2017_c
Bertrand Tavitian
 
14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c
Bertrand Tavitian
 
M kain fli-iam_piv_2017_c
M kain fli-iam_piv_2017_cM kain fli-iam_piv_2017_c
M kain fli-iam_piv_2017_c
Bertrand Tavitian
 
Mc jacquemot piv2017_c
Mc jacquemot piv2017_cMc jacquemot piv2017_c
Mc jacquemot piv2017_c
Bertrand Tavitian
 

More from Bertrand Tavitian (7)

14 30-mjoliot biomist-piv-2017
14 30-mjoliot biomist-piv-201714 30-mjoliot biomist-piv-2017
14 30-mjoliot biomist-piv-2017
 
9 30 fandre-dist_cnrs_piv_2017
9 30 fandre-dist_cnrs_piv_20179 30 fandre-dist_cnrs_piv_2017
9 30 fandre-dist_cnrs_piv_2017
 
M allanic piv2017_c
M allanic piv2017_cM allanic piv2017_c
M allanic piv2017_c
 
M joliot biomist-piv_2017_c
M joliot biomist-piv_2017_cM joliot biomist-piv_2017_c
M joliot biomist-piv_2017_c
 
14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c
 
M kain fli-iam_piv_2017_c
M kain fli-iam_piv_2017_cM kain fli-iam_piv_2017_c
M kain fli-iam_piv_2017_c
 
Mc jacquemot piv2017_c
Mc jacquemot piv2017_cMc jacquemot piv2017_c
Mc jacquemot piv2017_c
 

Recently uploaded

Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Jim Jacob Roy
 
Foundation of Yoga, YCB Level-3, Unit-1
Foundation of Yoga, YCB Level-3, Unit-1 Foundation of Yoga, YCB Level-3, Unit-1
Foundation of Yoga, YCB Level-3, Unit-1
Jyoti Bhaghasra
 
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
Université de Montréal
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
Dr. Dhwani kawedia
 
PARASITIC INFECTIONS IN CHILDREN peads.pptx
PARASITIC INFECTIONS IN CHILDREN peads.pptxPARASITIC INFECTIONS IN CHILDREN peads.pptx
PARASITIC INFECTIONS IN CHILDREN peads.pptx
MwambaChikonde1
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
Gokuldas Hospital
 
Travel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International TravelersTravel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International Travelers
NX Healthcare
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
FFragrant
 
acne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticals
acne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticalsacne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticals
acne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticals
MuskanShingari
 
pharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdfpharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdf
KerlynIgnacio
 
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl MumbaiCall Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Mobile Problem
 
SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.
KULDEEP VYAS
 
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
shruti jagirdar
 
Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.
Kunj Vihari
 
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
MuskanShingari
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
suvadeepdas911
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
LEFLOT Jean-Louis
 
How to Control Your Asthma Tips by gokuldas hospital.
How to Control Your Asthma Tips by gokuldas hospital.How to Control Your Asthma Tips by gokuldas hospital.
How to Control Your Asthma Tips by gokuldas hospital.
Gokuldas Hospital
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
FFragrant
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 

Recently uploaded (20)

Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
 
Foundation of Yoga, YCB Level-3, Unit-1
Foundation of Yoga, YCB Level-3, Unit-1 Foundation of Yoga, YCB Level-3, Unit-1
Foundation of Yoga, YCB Level-3, Unit-1
 
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
 
PARASITIC INFECTIONS IN CHILDREN peads.pptx
PARASITIC INFECTIONS IN CHILDREN peads.pptxPARASITIC INFECTIONS IN CHILDREN peads.pptx
PARASITIC INFECTIONS IN CHILDREN peads.pptx
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
 
Travel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International TravelersTravel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International Travelers
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
 
acne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticals
acne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticalsacne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticals
acne vulgaris -Mpharm (2nd semester) Cosmetics and cosmeceuticals
 
pharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdfpharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdf
 
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl MumbaiCall Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
 
SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.
 
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
STUDIES IN SUPPORT OF SPECIAL POPULATIONS: GERIATRICS E7
 
Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.
 
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
 
How to Control Your Asthma Tips by gokuldas hospital.
How to Control Your Asthma Tips by gokuldas hospital.How to Control Your Asthma Tips by gokuldas hospital.
How to Control Your Asthma Tips by gokuldas hospital.
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 

C cerin piv2017_c

  • 1. Gestion des données scientifiques en imagerie in vivo https://piv2017.sciencesconf.org/ Christophe CERIN, Université Paris13 christophe.cerin@lipn.univ-paris13.fr Plateformes de resources numériques Expérience USPC Paris, December 7, 2017
  • 2. Outline 2 • Motivations related to large scale platforms • Background: • Criteria for observing platforms • High Performance Computing (HPC) • Apache suite for Big Data • Some projects • Contribution: USPC platform • Conclusion and perspective
  • 3. Motivations related to large scale platforms In experimental Sciences we need tips for using them appropriately / best practices 3
  • 4. 4 Definition of a large scale system • System with unprecedented amounts of hardware (>1M cores, >100 PB storage, >100Gbits/s interconnect); • Ultra large scale system: hardware + lines of source code + numbers of users + volumes of data; • New problems: • built across multiple organizations; • often with conflicting purposes and needs; • constructed from heterogeneous parts with complex dependencies and emergent properties; • continuously evolving; • software, hardware and human failures will be the norm, not the exception.
  • 5. Since in our lives (Cloud Computing) 5 Software as a Service SaaS Plateform as a Service PaaS
 Infrastructure as a 
 Service
 IaaS
  • 6. • Don’t believe them; • Variety in the eco-system is the norm. 6 Some speeches are ready for you
  • 7. In Cloud, innovation drives technology/science Containers and no more Virtual Machines 7 Hardware Operating System App App App Conventional Stack Hardware OS App App App Hypervisor OS OS Virtualized Stack ▪Abstraction on resources in the past: VM technologies to hide physical properties, interactions between system layer, application layer and the end-users
  • 9. 9 In Sciences (not in business) • Experimental approach: modeling,…, experiments,… • What is an experimental plan for large scale systems and reproducible research? • Answers depend on the system used: • Cluster (dedicated; managed) • Cloud (almost non supervised… or by you)
  • 11. ▪Cluster ➔Performance ➔Flops - Top500 ▪Cluster ➔Power ➔Flops per watt - Green500 ▪Cloud ➔Price ➔for computing, for storing… ▪Cloud: the economic model may be strange (Amazon Spot Instances) ▪Scientific issues: how to compare cluster and cloud? (be fair) 11 Performance criteria for observing large scale system
  • 14. 14 Productivity ▪Productivity traditionally refers to the ratio between the quantity of software produced and the cost spent for it. ▪Factors: Programming language used, Program size, The experience of programmers and design personnel, The novelty of requirements, The complexity of the program and its data, The use of structured programming methods, Program class or the distribution method, Program type of the application area, Tools and environmental conditions, Enhancing existing programs or systems, Maintaining existing programs or systems, Reusing existing modules and standard designs, Program generators, Fourth-generation languages, Geographic separation of development locations, Defect potentials and removal methods, (Existing) Documentation, Prototyping before main development begins, Project teams and organization structures, Morale and compensation of staff
  • 15. ▪Parallel (http://press3.mcs.anl.gov/atpesc/files/2015/03/ESCTP-snir_215.pdf): ▪Shared memory (OpenMP): ▪CPU+vector ▪CPU+GPU ▪CPU+accelerators (Google Tensor processing unit…) ▪Message passing (MPI) ▪Sequential with the call to « parallel data structure » (Scala) ▪Map-reduce: Hadoop, Spark (In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing.) 15 Programming model
  • 16. ▪Even more complicated: Cray has added "ARM Option" (i.e. CPU blade option, using the ThunderX2) to their XC50 supercomputers ▪We need consolidations: ▪in hardware ▪in software ▪in algorithms ▪in applications ▪in experimental methods ▪hpc facilities ▪training 16 Hardware landscape
  • 17. 17 Apache suite for Big Data 1- Bases de Données 1.1 Apache Hbase 1.2 Apache Cassandra 1.3 CouchDB 1.4 MongoDB 2 Accès aux données/ requêtage 2.1 Pig 2.2 Hive 2.3 Livy 3 Data Intelligence 3.1 Apache Drill 3.2 Apache Mahout 3.3 H2O 4 Data Serialisation 4.1 Apache Thrift 4.2 Apache Avro 5 Data integration 5.1 Apache Sqoop 5.2 Apache Flume 5.3 Apache Chuckwa 6 Requetage 6.1 Presto 6.2 Impala 6.3 Dremel 7 Sécurité des données 7.1 Apache Metron 7.2 Sqrrl 8.2 MapReduce 8.3 Spark 9.1 Elasticsearch 10.1 Apache Hadoop 10.6 Apache Mesos 10.10 Apache Flink 10.12 Apache Zookeeper 10.15 Apache Storm 10.20 Apache Kafka
  • 18. Example of a project mixing HPC and Big Data considerations Where theory comes before data: physics chemistry Materials ... Where data comes before theory: Medicine business insights autonomy (smart car, building, city) 18
  • 19. Harp: Hadoop and Collective Communication for Iterative Computation 19 • Project http://salsaproj.indiana.edu/harp/ • Motivation: • Communication patterns are not abstracted and defined in Hadoop, Pregel/Giraph, Spark • In contrary, MPI has very fine grain based (collective) communication primitives (based on arrays and buffers) • Harp provides data abstractions and communication abstractions on top of them. It can be plugged into Hadoop runtime to enable efficient in-memory communication to avoid HDFS read/write.
  • 20. 20 Harp • The data abstraction has 3 categories horizontally and 3 levels vertically
  • 21. • Harp works as a plugin in Hadoop. The goal is to make Hadoop cluster can schedule original MapReduce jobs and Map-Collective jobs at the same time. • Collective communication is defined as movement of partitions within tables. • Collective communication requires synchronization. Hadoop scheduler is modified to schedule tasks in BSP style. • Fault tolerance with checkpointing. 21 Harp
  • 22. • For the data-scientist of nowadays; • Hub to facilite access to Amazon cloud resources, to share data, to collaborate between many people through notebooks; • RosettaHUB: https://www.rosettahub.com • Available through Amazon Educate program (https://s3.amazonaws.com/awseducate- list/AWS_Educate_Institutions.pdf) 22 RosettaHub
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. SPC and the deployment of virtual machines IDV experience (Imageries du vivant) 27
  • 29. 29
  • 30. 30
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. • Plateforme d'imagerie ePad http://pf-01.lab.parisdescartes.fr:1243/dcm4chee-web3/ • Plateforme PLM dans le cadre du projet DRIVE-SPC http://pf-01.lab.parisdescartes.fr:2345/tc/webclient • Plateforme Sis4web de SisNCom http://pf-01.lab.parisdescartes.fr:1323/sis4web/login.php • Benchsys http://pf-01.lab.parisdescartes.fr:9001/ • Owncloud http://pf-01.lab.parisdescartes.fr:1253/owncloud/index.php/login • … 37 Examples IDV services
  • 38. Conclusion and Future work Many, many opportunities 38
  • 39. • Computer hardware will continue to evolve; Trend: accelerators instead of GPUs; • Platforms will continue to evolve to facilitate the management of data, the computation on data; • Need for convergence/resonance between HPC and Big- data eco-systems; • SPC: needs for training, using, transforming the scientific approaches… with research engineers for accompany the change; • SPC: needs for projects… and to share betweenez communities. 39 Conclusion and perspective
  • 40. Thanks to Leila Abidi, Olivier Waldek, Roland Chervet 40