SlideShare a Scribd company logo
1 of 26
CloudMC: A cloud computing
map-reduce implementation
     for radiotherapy
 Rubén Jiménez Marrufo
 Héctor Miras del Río
 Carlos Miras del Río
 Carles Gomà Estadella
                                      Big Data Spain
                         http://www.bigdataspain.org
                         Madrid, November 16th, 2012
Contents
Introduction
Radiotherapy
Monte Carlo simulations for radiation transport
Monte Carlo parallelization
Clustering vs. Cloud Computing
Cloud Computing for clinical radiation transport
CloudMC
    DEMO START
    Architecture
    Map Reduce
    Elasticity
    How did Radarc help us?
    Results
    Is it reinventing the wheel?
    Roadmap
    DEMO RESULTS
Questions & Answers
Introduction

Héctor Miras del Río
Department of Medical Physics,
Virgen Macarena Hospital,
Seville, Spain

Rubén Jiménez Marrufo
R&D Division,
Icinetic TIC S.L.,
Seville, Spain

Carlos Miras del Río
R&D Division,
Wedoit Innovacion Tecnologica,
Seville, Spain
Carles Gomà
Centre for Proton Therapy,
Paul Scherrer Institute,
Villigen PSI, Switzerland
Introduction

       Monte Carlo
       Simulations




  Cloud
Computing       Radiotherapy
Radiotherapy


Radiotherapy: is the medical use
of ionizing radiation, generally as
part of cancer treatment to control
or kill malignant cells.




                                      Radiotherapy treatment planning: is
                                      the process for calculating the
                                      radiation dose to be absorbed by an
                                      object to be irradiated, prior to
                                      radiotherapy.
Monte Carlo simulations for
       radiation transport
Monte Carlo simulations for
       radiation transport
Monte Carlo simulation for
                                         radiation transport


 Monte Carlo Simulations:


+👍 Gold standard algorithms for
radiation calculations

- 👍 Extremely computationally
intensive and very time-
consuming.
Monte Carlo parallelization



Parallelization: Execute
simultaneously one
simulation in several nodes
and merge the results.

Monte Carlo simulations are
highly parallelizable since
the primary events are
independent.
Parallelization: Clustering vs.
             Cloud Computing
Cloud Computing for clinical
                                                radiation calculations

                   Number
tCPU =
                  instances                                    100 cores cluster ≈ 20 000 €
100 h
                    n = 100


                                                               160 years of computing time in
                                      Extra-
         T(n) =                                                an extra-small instance
                                      small
         1.44 h
                                   0.0142 € / h



                                                    1000
                     Cost / plan
                                                  patients /
                        2€
                                                    year
CloudMC


CloudMC offers an implementation of map/reduce over Windows Azure
cloud computing platform, for the parallelization of MC simulations of
radiation therapy dose distribution.

  Non-intrusive

  Multi-application:
    Penelope
    Geant4
    EGSnrc

  Elasticity:
     Resources are not reserved
     1 hour simulation costs 1 hour
CloudMC: DEMO
CloudMC Architecture

                                                  Service Management
    UI


 Services              Provisioning
                       MapReduce
  Entities                                         Worker Roles
                        Factory

             Repositories


                        Cloud Hosted Services




 Users &                                                    Simulation
Simulation                            Messages Queues          files

SQL Azure                                   Cloud Storage
CloudMC: MapReduce


Sequence of actions when carrying out a MC simulation on n instances:



                           3. Parallel Execution
                                  4. Reduce
                           5. End 2. Map
                            1. New Simulation
                                    of
                    Every worker role: simulation
                     - When the web role reads the n
      1. New         - Generation end offromindependent
                     messagesaof of n initial saved on
                         Finished simulation metadata is
                           Reads message simulation,
                     1.Simulation metadata is the queue and
                                       3. Parallel                5. End of
                   2.- Map
    simulation       seeds. on merges simulation files. Reduce
                     Resolver SQL the the n results
                      saveddownloads Azure.
                                       execution
                                                      4.
                                                                 simulation
                     SQL Azure.
                     2.Mapper: tothe “fragmented”
                     - Executes the storage. simulation
                     uploaded Modification of
                           simulation.
                     confignotices tohistories by the end
                      - Mail to divide the user of n.
                     - Simulation files are uploaded to the
                     3. the simulation arenthe storage.
                     -of Sends therolesthe proceed to
                        n-1 worker results to worker roles.
                        Provisioning of to scaled down.
                     Azure Storage. of simulation”
                     4. Sends an “end
                     -download themessages of “start”.
                        Sending of n results.
                        message.
CloudMC: Map


    Most of MC applications for radiation transport simulation read the
    configuration from textual files.

        Input A:
      Configuration              Histories: 1015         Executable
          Files
•   Simulation parameters
•   Histories count
•   Geometry & materials files
                                   Mapper: parametrized mapper to set
•   …                              histories number and seeds in the input files
•   MapReduce Parameters



                                                          Executable
                                                           Executable
          Input B                                            Executable
                                                              Executable
                                                                 Mapped
                                        Histories: 215          Executable
CloudMC: Reduce


  The result of MC applications for radiation transport simulation are
  dose, energy or any magnitude distribution files formatted in columns.



     Executable                                 Executable
      Executable                                 Executable
        Executable                                 Executable
                                                          Dose
         Executable
            Mapped                                  Executable
                                                      distribution
           Executable
                                                          files


Reducer: parametrized reducer to
combine columns depending on the
column type:
- Magnitude column                                      Output
- Uncertainty column
CloudMC: MapReduce DSL


CloudMC uses a MapReduce DSL to read parameters to adapt Mapper
and Reducer to specific MC applications.

 Mapper parameters                         Reducer parameters
CloudMC: Elasticity


Users choose the number of instances to use for each simulation.

CloudMC scales up worker role to run simulation and scales down
when it finishes.

Windows Azure Service Management allows roles scaling:

  👍 REST API
  👍 Based on XML config files

  👍 Minimum of 1 instance
  👍 Impossible to scale down
    specific instances (Multi-tenant)
CloudMC: How did Radarc help us?

                                                                  Service Management
                                 UI


Formula Azure                 Services        Provisioning
                                              MapReduce
                               Entities                            Worker Roles
≃ 50% generated code:                          Factory
                                      Repositories
 •   ASP.Net MVC 3 UI
 •   C# App Services
                                                Cloud Hosted Services
 •   C# POCO Entities
 •   EF CodeFirst
 •   SQL Azure DB

Focus on domain core:       Users &
                               User                                           Simulation
map/reduce,                Simulation
                             accounts                   Messages Queues          files
provisioning, fault
tolerance, etc.            SQL Azure                          Cloud Storage
CloudMC: Results


Case Study:
    Simulation: 125I seed in ophtalmic
    applicator.
    Number of histories: 3·109
    MC Code: PENELOPE, main program
    PenEasy.

Results:
   Worker instances size: extra-small
   Clock time in 1 instance: 30 h
   Clock time in 64 instances: 48 min
     (speed up = 37x)
CloudMC: Results


Time vs number of instances study



                                    T(n): Clock time for 1 simulation in
                                    n instances.

                                    tcpu: Overall time used only in the
                                    simulation of n histories.

                                    Dt0: Non-parallelizable time for 1
                                    instance.

                                    a: Non-parallelizable part of time
                                    proportional to n.
CloudMC: Is it reinventing the wheel?


Why not using Amazon Elastic MapReduce?
(http://aws.amazon.com/es/elasticmapreduce)

 •   Our mapper and reducer were written for .Net

http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map-
reduce-jobs-for-amazon-elastic-mapreduce-using-net


Why not using Hadoop On Azure?
(http://www.hadooponazure.com)

 • First preview released on 2012.
 • The cluster size must be reserved.
Roadmap



Testing with more MC applications: Geant4, EGSnrc, etc.

Support packages with specific MapReduce implementations
 • Application to different domains
 • Use of MEF to provide Mappers and Reducers in simulation
    packages

SDK to develop specific MapReduce implementation packages.
 • Visual Studio Templates could facilitate the development of
    CloudMC packages

Enable multi-tenant environments
 • Concurrent simulations require scaling down of specific
    instances that is not possible on Windows Azure.
Questions
Thank you for your attention …


     CloudMC soon available at:

https://cloudmontecarlo.cloudapp.net

      hector.miras@gmail.com
              @hmiras
       rjimenez@icinetic.com
             @rjimenez

More Related Content

What's hot

Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
DataWorks Summit
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
Bharat Rane
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
Kyong-Ha Lee
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
 

What's hot (20)

Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...dmapply: A functional primitive to express distributed machine learning algor...
dmapply: A functional primitive to express distributed machine learning algor...
 
H04502048051
H04502048051H04502048051
H04502048051
 
E031201032036
E031201032036E031201032036
E031201032036
 
Lec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptxLec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptx
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Hadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperHadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaper
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Shark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at ScaleShark SQL and Rich Analytics at Scale
Shark SQL and Rich Analytics at Scale
 
Jovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloudJovian DATA: A multidimensional database for the cloud
Jovian DATA: A multidimensional database for the cloud
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 

Viewers also liked

Viewers also liked (20)

Comparative study of aaa and pbc (1)
Comparative study of aaa and pbc (1)Comparative study of aaa and pbc (1)
Comparative study of aaa and pbc (1)
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
 
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
 
Essential ingredients for real time stream processing @Scale by Kartik pParam...
Essential ingredients for real time stream processing @Scale by Kartik pParam...Essential ingredients for real time stream processing @Scale by Kartik pParam...
Essential ingredients for real time stream processing @Scale by Kartik pParam...
 
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
Big Data, analytics and 4th generation data warehousing by Martyn Jones at Bi...
 
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
 
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...Analyzing organization e-mails in near real time using hadoop ecosystem tools...
Analyzing organization e-mails in near real time using hadoop ecosystem tools...
 
A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...
 

Similar to CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN JIMENEZ & HECTOR MIRAS at Big Data Spain 2012

Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system design
Mr. Chanuwan
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 
Network simulator 2
Network simulator 2Network simulator 2
Network simulator 2
AAKASH S
 
Energy Aware performance evaluation of WSNs.
Energy Aware performance evaluation of WSNs.Energy Aware performance evaluation of WSNs.
Energy Aware performance evaluation of WSNs.
ikrrish
 
Presentation l`aquila new
Presentation l`aquila newPresentation l`aquila new
Presentation l`aquila new
ikrrish
 

Similar to CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN JIMENEZ & HECTOR MIRAS at Big Data Spain 2012 (20)

Coca1
Coca1Coca1
Coca1
 
Delay Tolerant Streaming Services, Thomas Plagemann, UiO
Delay Tolerant Streaming Services, Thomas Plagemann, UiODelay Tolerant Streaming Services, Thomas Plagemann, UiO
Delay Tolerant Streaming Services, Thomas Plagemann, UiO
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Dimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsDimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache Simulations
 
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
Physics-ML のためのフレームワーク NVIDIA Modulus 最新事情
 
Cloudsim & greencloud
Cloudsim & greencloud Cloudsim & greencloud
Cloudsim & greencloud
 
SparkNet presentation
SparkNet presentationSparkNet presentation
SparkNet presentation
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Cloudsim & Green Cloud
Cloudsim & Green CloudCloudsim & Green Cloud
Cloudsim & Green Cloud
 
Application scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system design
 
Self Attested Images for Secured Transactions using Superior SOM
Self Attested Images for Secured Transactions using Superior SOMSelf Attested Images for Secured Transactions using Superior SOM
Self Attested Images for Secured Transactions using Superior SOM
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
Brandtzaeg master
Brandtzaeg masterBrandtzaeg master
Brandtzaeg master
 
Towards CloudML, a Model-Based Approach to Provision Resources in the Clouds
Towards CloudML, a Model-Based Approach  to Provision Resources in the CloudsTowards CloudML, a Model-Based Approach  to Provision Resources in the Clouds
Towards CloudML, a Model-Based Approach to Provision Resources in the Clouds
 
Network simulator 2
Network simulator 2Network simulator 2
Network simulator 2
 
Network simulator 2
Network simulator 2Network simulator 2
Network simulator 2
 
Energy Aware performance evaluation of WSNs.
Energy Aware performance evaluation of WSNs.Energy Aware performance evaluation of WSNs.
Energy Aware performance evaluation of WSNs.
 
Presentation l`aquila new
Presentation l`aquila newPresentation l`aquila new
Presentation l`aquila new
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 

More from Big Data Spain

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN JIMENEZ & HECTOR MIRAS at Big Data Spain 2012

  • 1. CloudMC: A cloud computing map-reduce implementation for radiotherapy Rubén Jiménez Marrufo Héctor Miras del Río Carlos Miras del Río Carles Gomà Estadella Big Data Spain http://www.bigdataspain.org Madrid, November 16th, 2012
  • 2. Contents Introduction Radiotherapy Monte Carlo simulations for radiation transport Monte Carlo parallelization Clustering vs. Cloud Computing Cloud Computing for clinical radiation transport CloudMC DEMO START Architecture Map Reduce Elasticity How did Radarc help us? Results Is it reinventing the wheel? Roadmap DEMO RESULTS Questions & Answers
  • 3. Introduction Héctor Miras del Río Department of Medical Physics, Virgen Macarena Hospital, Seville, Spain Rubén Jiménez Marrufo R&D Division, Icinetic TIC S.L., Seville, Spain Carlos Miras del Río R&D Division, Wedoit Innovacion Tecnologica, Seville, Spain Carles Gomà Centre for Proton Therapy, Paul Scherrer Institute, Villigen PSI, Switzerland
  • 4. Introduction Monte Carlo Simulations Cloud Computing Radiotherapy
  • 5. Radiotherapy Radiotherapy: is the medical use of ionizing radiation, generally as part of cancer treatment to control or kill malignant cells. Radiotherapy treatment planning: is the process for calculating the radiation dose to be absorbed by an object to be irradiated, prior to radiotherapy.
  • 6. Monte Carlo simulations for radiation transport
  • 7. Monte Carlo simulations for radiation transport
  • 8. Monte Carlo simulation for radiation transport Monte Carlo Simulations: +👍 Gold standard algorithms for radiation calculations - 👍 Extremely computationally intensive and very time- consuming.
  • 9. Monte Carlo parallelization Parallelization: Execute simultaneously one simulation in several nodes and merge the results. Monte Carlo simulations are highly parallelizable since the primary events are independent.
  • 11. Cloud Computing for clinical radiation calculations Number tCPU = instances 100 cores cluster ≈ 20 000 € 100 h n = 100 160 years of computing time in Extra- T(n) = an extra-small instance small 1.44 h 0.0142 € / h 1000 Cost / plan patients / 2€ year
  • 12. CloudMC CloudMC offers an implementation of map/reduce over Windows Azure cloud computing platform, for the parallelization of MC simulations of radiation therapy dose distribution. Non-intrusive Multi-application:  Penelope  Geant4  EGSnrc Elasticity:  Resources are not reserved  1 hour simulation costs 1 hour
  • 14. CloudMC Architecture Service Management UI Services Provisioning MapReduce Entities Worker Roles Factory Repositories Cloud Hosted Services Users & Simulation Simulation Messages Queues files SQL Azure Cloud Storage
  • 15. CloudMC: MapReduce Sequence of actions when carrying out a MC simulation on n instances: 3. Parallel Execution 4. Reduce 5. End 2. Map 1. New Simulation of Every worker role: simulation - When the web role reads the n 1. New - Generation end offromindependent messagesaof of n initial saved on Finished simulation metadata is Reads message simulation, 1.Simulation metadata is the queue and 3. Parallel 5. End of 2.- Map simulation seeds. on merges simulation files. Reduce Resolver SQL the the n results saveddownloads Azure. execution 4. simulation SQL Azure. 2.Mapper: tothe “fragmented” - Executes the storage. simulation uploaded Modification of simulation. confignotices tohistories by the end - Mail to divide the user of n. - Simulation files are uploaded to the 3. the simulation arenthe storage. -of Sends therolesthe proceed to n-1 worker results to worker roles. Provisioning of to scaled down. Azure Storage. of simulation” 4. Sends an “end -download themessages of “start”. Sending of n results. message.
  • 16. CloudMC: Map Most of MC applications for radiation transport simulation read the configuration from textual files. Input A: Configuration Histories: 1015 Executable Files • Simulation parameters • Histories count • Geometry & materials files Mapper: parametrized mapper to set • … histories number and seeds in the input files • MapReduce Parameters Executable Executable Input B Executable Executable Mapped Histories: 215 Executable
  • 17. CloudMC: Reduce The result of MC applications for radiation transport simulation are dose, energy or any magnitude distribution files formatted in columns. Executable Executable Executable Executable Executable Executable Dose Executable Mapped Executable distribution Executable files Reducer: parametrized reducer to combine columns depending on the column type: - Magnitude column Output - Uncertainty column
  • 18. CloudMC: MapReduce DSL CloudMC uses a MapReduce DSL to read parameters to adapt Mapper and Reducer to specific MC applications. Mapper parameters Reducer parameters
  • 19. CloudMC: Elasticity Users choose the number of instances to use for each simulation. CloudMC scales up worker role to run simulation and scales down when it finishes. Windows Azure Service Management allows roles scaling: 👍 REST API 👍 Based on XML config files 👍 Minimum of 1 instance 👍 Impossible to scale down specific instances (Multi-tenant)
  • 20. CloudMC: How did Radarc help us? Service Management UI Formula Azure Services Provisioning MapReduce Entities Worker Roles ≃ 50% generated code: Factory Repositories • ASP.Net MVC 3 UI • C# App Services Cloud Hosted Services • C# POCO Entities • EF CodeFirst • SQL Azure DB Focus on domain core: Users & User Simulation map/reduce, Simulation accounts Messages Queues files provisioning, fault tolerance, etc. SQL Azure Cloud Storage
  • 21. CloudMC: Results Case Study: Simulation: 125I seed in ophtalmic applicator. Number of histories: 3·109 MC Code: PENELOPE, main program PenEasy. Results: Worker instances size: extra-small Clock time in 1 instance: 30 h Clock time in 64 instances: 48 min (speed up = 37x)
  • 22. CloudMC: Results Time vs number of instances study T(n): Clock time for 1 simulation in n instances. tcpu: Overall time used only in the simulation of n histories. Dt0: Non-parallelizable time for 1 instance. a: Non-parallelizable part of time proportional to n.
  • 23. CloudMC: Is it reinventing the wheel? Why not using Amazon Elastic MapReduce? (http://aws.amazon.com/es/elasticmapreduce) • Our mapper and reducer were written for .Net http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map- reduce-jobs-for-amazon-elastic-mapreduce-using-net Why not using Hadoop On Azure? (http://www.hadooponazure.com) • First preview released on 2012. • The cluster size must be reserved.
  • 24. Roadmap Testing with more MC applications: Geant4, EGSnrc, etc. Support packages with specific MapReduce implementations • Application to different domains • Use of MEF to provide Mappers and Reducers in simulation packages SDK to develop specific MapReduce implementation packages. • Visual Studio Templates could facilitate the development of CloudMC packages Enable multi-tenant environments • Concurrent simulations require scaling down of specific instances that is not possible on Windows Azure.
  • 26. Thank you for your attention … CloudMC soon available at: https://cloudmontecarlo.cloudapp.net hector.miras@gmail.com @hmiras rjimenez@icinetic.com @rjimenez