CloudMC: A cloud computing
map-reduce implementation
     for radiotherapy
 Rubén Jiménez Marrufo
 Héctor Miras del Río
 Carlos Miras del Río
 Carles Gomà Estadella
                                      Big Data Spain
                         http://www.bigdataspain.org
                         Madrid, November 16th, 2012
Contents
Introduction
Radiotherapy
Monte Carlo simulations for radiation transport
Monte Carlo parallelization
Clustering vs. Cloud Computing
Cloud Computing for clinical radiation transport
CloudMC
    DEMO START
    Architecture
    Map Reduce
    Elasticity
    How did Radarc help us?
    Results
    Is it reinventing the wheel?
    Roadmap
    DEMO RESULTS
Questions & Answers
Introduction

Héctor Miras del Río
Department of Medical Physics,
Virgen Macarena Hospital,
Seville, Spain

Rubén Jiménez Marrufo
R&D Division,
Icinetic TIC S.L.,
Seville, Spain

Carlos Miras del Río
R&D Division,
Wedoit Innovacion Tecnologica,
Seville, Spain
Carles Gomà
Centre for Proton Therapy,
Paul Scherrer Institute,
Villigen PSI, Switzerland
Introduction

       Monte Carlo
       Simulations




  Cloud
Computing       Radiotherapy
Radiotherapy


Radiotherapy: is the medical use
of ionizing radiation, generally as
part of cancer treatment to control
or kill malignant cells.




                                      Radiotherapy treatment planning: is
                                      the process for calculating the
                                      radiation dose to be absorbed by an
                                      object to be irradiated, prior to
                                      radiotherapy.
Monte Carlo simulations for
       radiation transport
Monte Carlo simulations for
       radiation transport
Monte Carlo simulation for
                                         radiation transport


 Monte Carlo Simulations:


+👍 Gold standard algorithms for
radiation calculations

- 👍 Extremely computationally
intensive and very time-
consuming.
Monte Carlo parallelization



Parallelization: Execute
simultaneously one
simulation in several nodes
and merge the results.

Monte Carlo simulations are
highly parallelizable since
the primary events are
independent.
Parallelization: Clustering vs.
             Cloud Computing
Cloud Computing for clinical
                                                radiation calculations

                   Number
tCPU =
                  instances                                    100 cores cluster ≈ 20 000 €
100 h
                    n = 100


                                                               160 years of computing time in
                                      Extra-
         T(n) =                                                an extra-small instance
                                      small
         1.44 h
                                   0.0142 € / h



                                                    1000
                     Cost / plan
                                                  patients /
                        2€
                                                    year
CloudMC


CloudMC offers an implementation of map/reduce over Windows Azure
cloud computing platform, for the parallelization of MC simulations of
radiation therapy dose distribution.

  Non-intrusive

  Multi-application:
    Penelope
    Geant4
    EGSnrc

  Elasticity:
     Resources are not reserved
     1 hour simulation costs 1 hour
CloudMC: DEMO
CloudMC Architecture

                                                  Service Management
    UI


 Services              Provisioning
                       MapReduce
  Entities                                         Worker Roles
                        Factory

             Repositories


                        Cloud Hosted Services




 Users &                                                    Simulation
Simulation                            Messages Queues          files

SQL Azure                                   Cloud Storage
CloudMC: MapReduce


Sequence of actions when carrying out a MC simulation on n instances:



                           3. Parallel Execution
                                  4. Reduce
                           5. End 2. Map
                            1. New Simulation
                                    of
                    Every worker role: simulation
                     - When the web role reads the n
      1. New         - Generation end offromindependent
                     messagesaof of n initial saved on
                         Finished simulation metadata is
                           Reads message simulation,
                     1.Simulation metadata is the queue and
                                       3. Parallel                5. End of
                   2.- Map
    simulation       seeds. on merges simulation files. Reduce
                     Resolver SQL the the n results
                      saveddownloads Azure.
                                       execution
                                                      4.
                                                                 simulation
                     SQL Azure.
                     2.Mapper: tothe “fragmented”
                     - Executes the storage. simulation
                     uploaded Modification of
                           simulation.
                     confignotices tohistories by the end
                      - Mail to divide the user of n.
                     - Simulation files are uploaded to the
                     3. the simulation arenthe storage.
                     -of Sends therolesthe proceed to
                        n-1 worker results to worker roles.
                        Provisioning of to scaled down.
                     Azure Storage. of simulation”
                     4. Sends an “end
                     -download themessages of “start”.
                        Sending of n results.
                        message.
CloudMC: Map


    Most of MC applications for radiation transport simulation read the
    configuration from textual files.

        Input A:
      Configuration              Histories: 1015         Executable
          Files
•   Simulation parameters
•   Histories count
•   Geometry & materials files
                                   Mapper: parametrized mapper to set
•   …                              histories number and seeds in the input files
•   MapReduce Parameters



                                                          Executable
                                                           Executable
          Input B                                            Executable
                                                              Executable
                                                                 Mapped
                                        Histories: 215          Executable
CloudMC: Reduce


  The result of MC applications for radiation transport simulation are
  dose, energy or any magnitude distribution files formatted in columns.



     Executable                                 Executable
      Executable                                 Executable
        Executable                                 Executable
                                                          Dose
         Executable
            Mapped                                  Executable
                                                      distribution
           Executable
                                                          files


Reducer: parametrized reducer to
combine columns depending on the
column type:
- Magnitude column                                      Output
- Uncertainty column
CloudMC: MapReduce DSL


CloudMC uses a MapReduce DSL to read parameters to adapt Mapper
and Reducer to specific MC applications.

 Mapper parameters                         Reducer parameters
CloudMC: Elasticity


Users choose the number of instances to use for each simulation.

CloudMC scales up worker role to run simulation and scales down
when it finishes.

Windows Azure Service Management allows roles scaling:

  👍 REST API
  👍 Based on XML config files

  👍 Minimum of 1 instance
  👍 Impossible to scale down
    specific instances (Multi-tenant)
CloudMC: How did Radarc help us?

                                                                  Service Management
                                 UI


Formula Azure                 Services        Provisioning
                                              MapReduce
                               Entities                            Worker Roles
≃ 50% generated code:                          Factory
                                      Repositories
 •   ASP.Net MVC 3 UI
 •   C# App Services
                                                Cloud Hosted Services
 •   C# POCO Entities
 •   EF CodeFirst
 •   SQL Azure DB

Focus on domain core:       Users &
                               User                                           Simulation
map/reduce,                Simulation
                             accounts                   Messages Queues          files
provisioning, fault
tolerance, etc.            SQL Azure                          Cloud Storage
CloudMC: Results


Case Study:
    Simulation: 125I seed in ophtalmic
    applicator.
    Number of histories: 3·109
    MC Code: PENELOPE, main program
    PenEasy.

Results:
   Worker instances size: extra-small
   Clock time in 1 instance: 30 h
   Clock time in 64 instances: 48 min
     (speed up = 37x)
CloudMC: Results


Time vs number of instances study



                                    T(n): Clock time for 1 simulation in
                                    n instances.

                                    tcpu: Overall time used only in the
                                    simulation of n histories.

                                    Dt0: Non-parallelizable time for 1
                                    instance.

                                    a: Non-parallelizable part of time
                                    proportional to n.
CloudMC: Is it reinventing the wheel?


Why not using Amazon Elastic MapReduce?
(http://aws.amazon.com/es/elasticmapreduce)

 •   Our mapper and reducer were written for .Net

http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map-
reduce-jobs-for-amazon-elastic-mapreduce-using-net


Why not using Hadoop On Azure?
(http://www.hadooponazure.com)

 • First preview released on 2012.
 • The cluster size must be reserved.
Roadmap



Testing with more MC applications: Geant4, EGSnrc, etc.

Support packages with specific MapReduce implementations
 • Application to different domains
 • Use of MEF to provide Mappers and Reducers in simulation
    packages

SDK to develop specific MapReduce implementation packages.
 • Visual Studio Templates could facilitate the development of
    CloudMC packages

Enable multi-tenant environments
 • Concurrent simulations require scaling down of specific
    instances that is not possible on Windows Azure.
Questions
Thank you for your attention …


     CloudMC soon available at:

https://cloudmontecarlo.cloudapp.net

      hector.miras@gmail.com
              @hmiras
       rjimenez@icinetic.com
             @rjimenez

CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN JIMENEZ & HECTOR MIRAS at Big Data Spain 2012

  • 1.
    CloudMC: A cloudcomputing map-reduce implementation for radiotherapy Rubén Jiménez Marrufo Héctor Miras del Río Carlos Miras del Río Carles Gomà Estadella Big Data Spain http://www.bigdataspain.org Madrid, November 16th, 2012
  • 2.
    Contents Introduction Radiotherapy Monte Carlo simulationsfor radiation transport Monte Carlo parallelization Clustering vs. Cloud Computing Cloud Computing for clinical radiation transport CloudMC DEMO START Architecture Map Reduce Elasticity How did Radarc help us? Results Is it reinventing the wheel? Roadmap DEMO RESULTS Questions & Answers
  • 3.
    Introduction Héctor Miras delRío Department of Medical Physics, Virgen Macarena Hospital, Seville, Spain Rubén Jiménez Marrufo R&D Division, Icinetic TIC S.L., Seville, Spain Carlos Miras del Río R&D Division, Wedoit Innovacion Tecnologica, Seville, Spain Carles Gomà Centre for Proton Therapy, Paul Scherrer Institute, Villigen PSI, Switzerland
  • 4.
    Introduction Monte Carlo Simulations Cloud Computing Radiotherapy
  • 5.
    Radiotherapy Radiotherapy: is themedical use of ionizing radiation, generally as part of cancer treatment to control or kill malignant cells. Radiotherapy treatment planning: is the process for calculating the radiation dose to be absorbed by an object to be irradiated, prior to radiotherapy.
  • 6.
    Monte Carlo simulationsfor radiation transport
  • 7.
    Monte Carlo simulationsfor radiation transport
  • 8.
    Monte Carlo simulationfor radiation transport Monte Carlo Simulations: +👍 Gold standard algorithms for radiation calculations - 👍 Extremely computationally intensive and very time- consuming.
  • 9.
    Monte Carlo parallelization Parallelization:Execute simultaneously one simulation in several nodes and merge the results. Monte Carlo simulations are highly parallelizable since the primary events are independent.
  • 10.
  • 11.
    Cloud Computing forclinical radiation calculations Number tCPU = instances 100 cores cluster ≈ 20 000 € 100 h n = 100 160 years of computing time in Extra- T(n) = an extra-small instance small 1.44 h 0.0142 € / h 1000 Cost / plan patients / 2€ year
  • 12.
    CloudMC CloudMC offers animplementation of map/reduce over Windows Azure cloud computing platform, for the parallelization of MC simulations of radiation therapy dose distribution. Non-intrusive Multi-application:  Penelope  Geant4  EGSnrc Elasticity:  Resources are not reserved  1 hour simulation costs 1 hour
  • 13.
  • 14.
    CloudMC Architecture Service Management UI Services Provisioning MapReduce Entities Worker Roles Factory Repositories Cloud Hosted Services Users & Simulation Simulation Messages Queues files SQL Azure Cloud Storage
  • 15.
    CloudMC: MapReduce Sequence ofactions when carrying out a MC simulation on n instances: 3. Parallel Execution 4. Reduce 5. End 2. Map 1. New Simulation of Every worker role: simulation - When the web role reads the n 1. New - Generation end offromindependent messagesaof of n initial saved on Finished simulation metadata is Reads message simulation, 1.Simulation metadata is the queue and 3. Parallel 5. End of 2.- Map simulation seeds. on merges simulation files. Reduce Resolver SQL the the n results saveddownloads Azure. execution 4. simulation SQL Azure. 2.Mapper: tothe “fragmented” - Executes the storage. simulation uploaded Modification of simulation. confignotices tohistories by the end - Mail to divide the user of n. - Simulation files are uploaded to the 3. the simulation arenthe storage. -of Sends therolesthe proceed to n-1 worker results to worker roles. Provisioning of to scaled down. Azure Storage. of simulation” 4. Sends an “end -download themessages of “start”. Sending of n results. message.
  • 16.
    CloudMC: Map Most of MC applications for radiation transport simulation read the configuration from textual files. Input A: Configuration Histories: 1015 Executable Files • Simulation parameters • Histories count • Geometry & materials files Mapper: parametrized mapper to set • … histories number and seeds in the input files • MapReduce Parameters Executable Executable Input B Executable Executable Mapped Histories: 215 Executable
  • 17.
    CloudMC: Reduce The result of MC applications for radiation transport simulation are dose, energy or any magnitude distribution files formatted in columns. Executable Executable Executable Executable Executable Executable Dose Executable Mapped Executable distribution Executable files Reducer: parametrized reducer to combine columns depending on the column type: - Magnitude column Output - Uncertainty column
  • 18.
    CloudMC: MapReduce DSL CloudMCuses a MapReduce DSL to read parameters to adapt Mapper and Reducer to specific MC applications. Mapper parameters Reducer parameters
  • 19.
    CloudMC: Elasticity Users choosethe number of instances to use for each simulation. CloudMC scales up worker role to run simulation and scales down when it finishes. Windows Azure Service Management allows roles scaling: 👍 REST API 👍 Based on XML config files 👍 Minimum of 1 instance 👍 Impossible to scale down specific instances (Multi-tenant)
  • 20.
    CloudMC: How didRadarc help us? Service Management UI Formula Azure Services Provisioning MapReduce Entities Worker Roles ≃ 50% generated code: Factory Repositories • ASP.Net MVC 3 UI • C# App Services Cloud Hosted Services • C# POCO Entities • EF CodeFirst • SQL Azure DB Focus on domain core: Users & User Simulation map/reduce, Simulation accounts Messages Queues files provisioning, fault tolerance, etc. SQL Azure Cloud Storage
  • 21.
    CloudMC: Results Case Study: Simulation: 125I seed in ophtalmic applicator. Number of histories: 3·109 MC Code: PENELOPE, main program PenEasy. Results: Worker instances size: extra-small Clock time in 1 instance: 30 h Clock time in 64 instances: 48 min (speed up = 37x)
  • 22.
    CloudMC: Results Time vsnumber of instances study T(n): Clock time for 1 simulation in n instances. tcpu: Overall time used only in the simulation of n histories. Dt0: Non-parallelizable time for 1 instance. a: Non-parallelizable part of time proportional to n.
  • 23.
    CloudMC: Is itreinventing the wheel? Why not using Amazon Elastic MapReduce? (http://aws.amazon.com/es/elasticmapreduce) • Our mapper and reducer were written for .Net http://stackoverflow.com/questions/1190520/is-it-possible-to-write-map- reduce-jobs-for-amazon-elastic-mapreduce-using-net Why not using Hadoop On Azure? (http://www.hadooponazure.com) • First preview released on 2012. • The cluster size must be reserved.
  • 24.
    Roadmap Testing with moreMC applications: Geant4, EGSnrc, etc. Support packages with specific MapReduce implementations • Application to different domains • Use of MEF to provide Mappers and Reducers in simulation packages SDK to develop specific MapReduce implementation packages. • Visual Studio Templates could facilitate the development of CloudMC packages Enable multi-tenant environments • Concurrent simulations require scaling down of specific instances that is not possible on Windows Azure.
  • 25.
  • 26.
    Thank you foryour attention … CloudMC soon available at: https://cloudmontecarlo.cloudapp.net hector.miras@gmail.com @hmiras rjimenez@icinetic.com @rjimenez