SlideShare a Scribd company logo
1 of 31
CLOUD COMPUTING: AN
      ALTERNATIVE PLATFORM
                FOR
SCIENTIFIC COMPUTING
       SCIENTIFIC COMPUTING

   Presented by: DAVID RAMIREZ
 COMP5003 GRADUATE SEMINAR AND PROJECT RESEARCH

              Instructor: Dr. A. Lodgher

       PRAIRIE VIEW A&M UNIVERSITY OF TEXAS
                     May, 2009
Big scientific challenges…




                                                      Oak Ridge National Laboratory
                Bioengineering
                Bioinformatics                                                          Climate models




                                                                                      Business Week
Argonne Labs




                Astrophysics model – exploding star                                                   Aerospace
University of Texas
The traditional approach…
Supercomputers, clusters




                                                   “Ranger” cluster at UT Austin TX
                                                   Ca. 4000 nodes (Linux based)
                                                   580 Tflops
                                                   31 TB local memory




                                                                                      Texas A&M University
 IBM Blue Gene
 Argonne National Laboratory (Illinois)
 US Department of Energy
 1 PFLOP
 Other installations in progress
 (Germany) will reach 4 PFLOP by 2011
 64K Nodes and more

                                          “Hydra“ at Texas A&M
                                          52 nodes, 832 IBM processors (AIX based)
                                          6.3 Tflops
                                          1.6 TB memory, 20 TB storage
Cray XT5 JAGUAR




                                              Cray – Oak Ridge National Laboratoy
                                1,4 PFLOP
                 181,000 processing cores
               (AMD Opteron, 2 or 2 core)
                             Linux-based
            16 to 32 GB memory per node
            Oak Ridge National Laboratory




          Necessity for high performance
                             visualization:
         STALLION visualization center at
TACC (Texas Advanced Computing Center)
               University of Texas, Austin
High Demands of Computing Power in Science
                                                   … some examples

                                                                                                         720x720x1620 point grid
Lawrence Livermore National Lab. – VisIt Gallery




                                                                                                         1620 processors
                                                                                                         20 days
                                                                                                         20 terabytes of data output.



                                                   Large-Eddy Simulation of Raleigh-Taylor instability




                                                                                                                                        Lawrence Livermore National Lab. – VisIt Gallery
                                                                                 11 million cells
                                                    512 processors of the FROST supercomputer
                                                           at Lawrence Livermore National Lab.
                                                                                        36 hours
                                                                      2 terabytes of data output
Scientific computing: Some History…
• Scientific computing always a driving force for hardware development.
• “Mainframe” the first platform.
• FORTRAN programming language became the (still dominant) standard.




                                    IBM




        C EXAMPLE OF FORTRAN CODE
           REAL SUM, CNTR, NUM
           SUM = 0
           DO 10 CNTR = 1, 1000
            READ(*,*) NUM
            SUM = SUM + NUM
        10 CONTINUE
The next steps in
                                             hardware evolution…



Minicomputer   www.Xconomy.com




                                                        HEWLETT PACKARD
DEC PDP-8




                                                                                       IBM
                                 Desktop Minicomputer
                                 Hewlett-Packard




                                                                          The IBM PC
The network connected computers….and




The Internet was born (1980’s-90’s)
The next logical step:
Aggregate the power of networked computers
towards the solution of highly demanding computing
tasks. Parallelize solution of problems.




                                                     David Ramirez
                           THE GRID CONCEPT WAS BORN !
The concepts behind grid computing….

                                        Before… SERIAL COMPUTING




                                                                    SOURCE: https://computing.llnl.gov/tutorials/parallel_comp/#Whatis
       Single computer – single CPU.




                                                                    LAWRENCE LIVERMORE NATIONAL LABORATORY
      A problem is broken into a discrete series of instructions.

      Instructions are executed one after another.

      Only one instruction may execute at any moment in time.
Now … PARALLEL COMPUTING




                                                                             SOURCE: https://computing.llnl.gov/tutorials/parallel_comp/#Whatis
                                                                             LAWRENCE LIVERMORE NATIONAL LABORATORY
     Software designed to    A problem is broken     Each part is further
     be run using multiple    into discrete parts     broken down to a
             CPUs             that can be solved    series of instructions
                                 concurrently


                              Instructions from
                             each part execute
                             simultaneously on
                                different CPUs
PARALLEL COMPUTING: DEFINITIONS

 Simultaneous use of multiple computing resources to solve
 a computational problem.

 Run using multiple CPUs

 A problem is broken into discrete parts that can be solved
 concurrently

 Each part is further broken down to a series of instructions

 Instructions from each part execute simultaneously on
 different CPUs
PARALLEL COMPUTER CLASSIFICATION :
 Ol’ serial           FLYNN’S TAXONOMY (1966)                    Graphics
computer                                                         processors


                        SISD                   SIMD
                  Single Instruction,     Single Instruction,
                     Single Data            Multiple Data




                                                                              SOURCE: https://computing.llnl.gov/tutorials/parallel_comp/#Whatis
                       MISD                    MIMD




                                                                              LAWRENCE LIVERMORE NATIONAL LABORATORY
                 Multiple Instruction,   Multiple Instruction,
                     Single Data           Multiple Data


Rare – Space                                                  Most
Shuttle Flight                                               modern
 Computer                                                    parallel
                                                            computers
COMPUTATIONAL PROBLEMS IN PARALLEL COMPUTING
                                                          Perfect for loose
                                                                grids
Embarrassingly parallel calculations:                      (delays not so
                                                             important)
• each sub-calculation is independent of all the other
  calculations. Subtasks rarely or never communicate
  between them. Best for High-throughput computing

Fine-grained calculations                                  More suitable
                                                                for
• Each sub-calculation is dependent on the result of      supercomputers
  another sub-calculation. Subtasks communicate many
  times per second. Best for High-performance computing

Coarse-grained calculations                                 More suitable
                                                                 for
• Subtasks communicate between them less frequently        supercomputers
  (just several times per second).
SIMPLE EXAMPLE – Heat modeling




                                                                        Source: Lawrence Livermore National Laboratory
The entire array is              Master process sends initial info to
partitioned and distributed      workers, checks for convergence
as subarrays to all tasks.       and collects results .
Each task owns a portion of
the total array.                 Worker process calculates
                                 solution, communicating as
                                 necessary with neighbor processes
DIFFERENT MODELS
POR PARALLEL
COMPUTING
PROCESSING AND
DATA DISTRIBUTION
HIGH-THROUGHPUT PROBLEMS

Problems divided into many
independent tasks


    Computing grids used to schedule
    these tasks, dealing them out to the
    different processors in the grid.


        As soon as a processor finishes on
        task, the next task arrives.




       Example: Large Hadron Collider Computer Grid (CERN / Geneva)
APPROACHES FOR PARALLEL COMPUTING
                    IMPLEMENTATION
                        CLUSTER
                        Processors are close together
                        High speed of network, low latency
                        When big: “Supercomputer”
                        Ideal for fine-grained, high performance
                        Computation.
Or a mix




                        GRID
                        Disperse – even wide distances
                        Is the most distributed form of parallel computing
                        Internet as main transport
                        Loose connectivity
                        High latency
                        Ideal for embarrasingly parallel, high
                        Throughput computation.
                        Mostly commodity hardware in nodes
GRIDS ALL OVER THE WORLD …                                   CERN LHC
                                                          Computing Grid
                                                          200K processors
                                                            11 clusters
                                                            worldwide




                                                                   Source: http://www.accessgrid.org
CERN Large Hadron Collider CG currently most important / powerful scientific grid
Pioneering Grid Vendors




 Scientific applications      General market : commerce,
 External or internal grids   industry, science
 Pioneer client software      Computing on demand model
                              Sun Grid Engine (middleware)
                              Lustre distributed Filesystem
Now…The Cloud meets the Grid…




Grid resources become abstractions (“black boxes”)
New players in scene…




                        Many more
                        Joining in…
CASE STUDY:
             AMAZON WEB SERVICES

     EC2               S3
Elastic Compute   Simple Storage
      Cloud          Service

  SimpleDB             SQS
 (unstructured    Simple Queue
   database)         Service


             Elastic                  Enables
                                   parallelization
           MapReduce
PROOF-OF-CONCEPT PROPOSAL:
Use AWS cloud infrastructure as a
platform for scientific applications
oriented,
high performance / high throughput,
parallel computing.
AWS HOW-TO FOR DOING THIS ( “self-service”)

      • Have embarrasingly parallel problem at hand.
      • Code problem solution using parallel techniques such as MPI
1       (Message Passing Interface). Use Fortran, C, C++, Python.




                                                                              * HADOOP is an open-source product of the Apache Software Foundation written in Java™
      • Create running environment snapshot (full with OS, software) and
        store in S3.
      • MAP & REDUCE using AWS HADOOP* middleware
2       implementation. Balance loads.




      • Feed separate tasks to n EC2 nodes. Start nodes on demand using
        the S3-stored images. Deploy & Coordinate with HADOOP.

3     • Collect partial results, assemble into final product (master node).
Source: http://hadoop.apache.org/core/
MAP &
REDUCE
ARCHITECTURE




   Price list for AWS (as of Spring, 2009)




                                                                 http://aws.amazon.com/elasticmapreduce/#pricing
    Service                           Cost
                                      (using maximum capacity)
   EC2                             $0.80/hr
   MapReduce                       $0.12/hr
   S3                              $0.15/GB per month
                                   $0.10/GB Data transfer
OTHERS DOING SIMILAR WORK…
SOME CURRENT ACADEMIC
       CURRENT PROJECTS & WORKING
             IMPLEMENTATIONS
 OF CLOUD COMPUTING-BASED SCIENTIFIC GRIDS
            (Nimbus Framework)



•University of Chicago (NIMBUS)
•University of Florida (STRATUS)
•University of Purdue (WISPY)
•Masarik University (KUPA) (Czech Republic)
                               Source: http://workspace.globus.org/clouds
CONCLUSION
By integrating networking, computation and information,
the Grid provides a practical, virtual platform for computing
suitable for scientific research.

AWS cloud services make it easy and affordable to
implement a sufficiently powerful, scalable, and practical
grid computing platform.

   •Can be self serviced.
   •On-demand model
   •Very economic
   •Suitable for fast-turnkey solutions without the expense of costly
   infrastructure, computer time.
   •Ideal in an academic environment, to foster hands-on research
   with complex models.
FUTURE WORK

GLOBALLY : Grid computing, now more widely enabled by cloud-
computing (Infrastructure-as-a-Service) platforms and the
sponsorship of governments, industries and the scientific
community, is a fundamental component for the future of
computing.

LOCALLY: The goal of this research paper is to provide a basis for a
near-future practical, proof-of-concept implementation of a
Cloud-based Grid that puts Prairie View A&M University in the list
of universities having access and use of such infrastructure, for
the benefit of its students, academic staff, and the community in
general.
QUESTIONS ?

More Related Content

What's hot

Top500 List June 2012
Top500 List June 2012Top500 List June 2012
Top500 List June 2012top500
 
Scheduled Scientific Data Releases Using .backup Volumes
Scheduled Scientific Data Releases Using .backup VolumesScheduled Scientific Data Releases Using .backup Volumes
Scheduled Scientific Data Releases Using .backup Volumesretsamedoc
 
ACM HPDC 2010参加報告
ACM HPDC 2010参加報告ACM HPDC 2010参加報告
ACM HPDC 2010参加報告Ryousei Takano
 
Captura de pacotes no KernelSpace
Captura de pacotes no KernelSpaceCaptura de pacotes no KernelSpace
Captura de pacotes no KernelSpacePeslPinguim
 
Intel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote SlidesIntel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote SlidesDESMOND YUEN
 

What's hot (6)

Mateo valero p2
Mateo valero p2Mateo valero p2
Mateo valero p2
 
Top500 List June 2012
Top500 List June 2012Top500 List June 2012
Top500 List June 2012
 
Scheduled Scientific Data Releases Using .backup Volumes
Scheduled Scientific Data Releases Using .backup VolumesScheduled Scientific Data Releases Using .backup Volumes
Scheduled Scientific Data Releases Using .backup Volumes
 
ACM HPDC 2010参加報告
ACM HPDC 2010参加報告ACM HPDC 2010参加報告
ACM HPDC 2010参加報告
 
Captura de pacotes no KernelSpace
Captura de pacotes no KernelSpaceCaptura de pacotes no KernelSpace
Captura de pacotes no KernelSpace
 
Intel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote SlidesIntel 2020 Labs Day Keynote Slides
Intel 2020 Labs Day Keynote Slides
 

Viewers also liked

Virtualization Mobile Platform Android Case
Virtualization Mobile Platform Android CaseVirtualization Mobile Platform Android Case
Virtualization Mobile Platform Android CaseDavid Ramirez
 
Sometimes you feel like a docker... (SF)
Sometimes you feel like a docker... (SF)Sometimes you feel like a docker... (SF)
Sometimes you feel like a docker... (SF)bridgetkromhout
 
Cloud Computing in the Enterprise
Cloud Computing in the EnterpriseCloud Computing in the Enterprise
Cloud Computing in the EnterpriseDavid Ramirez
 
Ethics And Computing
Ethics And ComputingEthics And Computing
Ethics And ComputingDavid Ramirez
 

Viewers also liked (6)

Amazon.com
Amazon.comAmazon.com
Amazon.com
 
Virtualization Mobile Platform Android Case
Virtualization Mobile Platform Android CaseVirtualization Mobile Platform Android Case
Virtualization Mobile Platform Android Case
 
Sometimes you feel like a docker... (SF)
Sometimes you feel like a docker... (SF)Sometimes you feel like a docker... (SF)
Sometimes you feel like a docker... (SF)
 
Cloud Computing in the Enterprise
Cloud Computing in the EnterpriseCloud Computing in the Enterprise
Cloud Computing in the Enterprise
 
Ethics And Computing
Ethics And ComputingEthics And Computing
Ethics And Computing
 
Parallel Computing
Parallel Computing Parallel Computing
Parallel Computing
 

Similar to Cloud Computing: An Alternative Platform for Scientific Computing

Top Five Super Computers (1).pptx
Top Five Super Computers (1).pptxTop Five Super Computers (1).pptx
Top Five Super Computers (1).pptxwwwsyedashah631
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
 
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/LRama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/Lmsramakrishna
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)Guy Coates
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyPerry Lea
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Researchshandra_psc
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure TechExeter
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicEric Verhulst
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopPiotr Turek
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...Larry Smarr
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...Larry Smarr
 
Sierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science UnleashedSierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science Unleashedinside-BigData.com
 

Similar to Cloud Computing: An Alternative Platform for Scientific Computing (20)

Super Computers
Super ComputersSuper Computers
Super Computers
 
supercomputer
supercomputersupercomputer
supercomputer
 
Top Five Super Computers (1).pptx
Top Five Super Computers (1).pptxTop Five Super Computers (1).pptx
Top Five Super Computers (1).pptx
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
 
Supercomputers
SupercomputersSupercomputers
Supercomputers
 
Supercomputers
SupercomputersSupercomputers
Supercomputers
 
Rama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/LRama krishna ppts for blue gene/L
Rama krishna ppts for blue gene/L
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)
 
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st CenturyThe von Neumann Memory Barrier and Computer Architectures for the 21st Century
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Research
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure
 
Supercomputer @ manarat university by reza
Supercomputer  @ manarat university by rezaSupercomputer  @ manarat university by reza
Supercomputer @ manarat university by reza
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 Altreonic
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
High Performance Cyberinfrastructure Enables Data-Driven Science in the Globa...
 
Sierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science UnleashedSierra Supercomputer: Science Unleashed
Sierra Supercomputer: Science Unleashed
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Cloud Computing: An Alternative Platform for Scientific Computing

  • 1. CLOUD COMPUTING: AN ALTERNATIVE PLATFORM FOR SCIENTIFIC COMPUTING SCIENTIFIC COMPUTING Presented by: DAVID RAMIREZ COMP5003 GRADUATE SEMINAR AND PROJECT RESEARCH Instructor: Dr. A. Lodgher PRAIRIE VIEW A&M UNIVERSITY OF TEXAS May, 2009
  • 2. Big scientific challenges… Oak Ridge National Laboratory Bioengineering Bioinformatics Climate models Business Week Argonne Labs Astrophysics model – exploding star Aerospace
  • 3. University of Texas The traditional approach… Supercomputers, clusters “Ranger” cluster at UT Austin TX Ca. 4000 nodes (Linux based) 580 Tflops 31 TB local memory Texas A&M University IBM Blue Gene Argonne National Laboratory (Illinois) US Department of Energy 1 PFLOP Other installations in progress (Germany) will reach 4 PFLOP by 2011 64K Nodes and more “Hydra“ at Texas A&M 52 nodes, 832 IBM processors (AIX based) 6.3 Tflops 1.6 TB memory, 20 TB storage
  • 4. Cray XT5 JAGUAR Cray – Oak Ridge National Laboratoy 1,4 PFLOP 181,000 processing cores (AMD Opteron, 2 or 2 core) Linux-based 16 to 32 GB memory per node Oak Ridge National Laboratory Necessity for high performance visualization: STALLION visualization center at TACC (Texas Advanced Computing Center) University of Texas, Austin
  • 5. High Demands of Computing Power in Science … some examples 720x720x1620 point grid Lawrence Livermore National Lab. – VisIt Gallery 1620 processors 20 days 20 terabytes of data output. Large-Eddy Simulation of Raleigh-Taylor instability Lawrence Livermore National Lab. – VisIt Gallery 11 million cells 512 processors of the FROST supercomputer at Lawrence Livermore National Lab. 36 hours 2 terabytes of data output
  • 6. Scientific computing: Some History… • Scientific computing always a driving force for hardware development. • “Mainframe” the first platform. • FORTRAN programming language became the (still dominant) standard. IBM C EXAMPLE OF FORTRAN CODE REAL SUM, CNTR, NUM SUM = 0 DO 10 CNTR = 1, 1000 READ(*,*) NUM SUM = SUM + NUM 10 CONTINUE
  • 7. The next steps in hardware evolution… Minicomputer www.Xconomy.com HEWLETT PACKARD DEC PDP-8 IBM Desktop Minicomputer Hewlett-Packard The IBM PC
  • 8. The network connected computers….and The Internet was born (1980’s-90’s)
  • 9. The next logical step: Aggregate the power of networked computers towards the solution of highly demanding computing tasks. Parallelize solution of problems. David Ramirez THE GRID CONCEPT WAS BORN !
  • 10. The concepts behind grid computing…. Before… SERIAL COMPUTING SOURCE: https://computing.llnl.gov/tutorials/parallel_comp/#Whatis  Single computer – single CPU. LAWRENCE LIVERMORE NATIONAL LABORATORY A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.
  • 11. Now … PARALLEL COMPUTING SOURCE: https://computing.llnl.gov/tutorials/parallel_comp/#Whatis LAWRENCE LIVERMORE NATIONAL LABORATORY Software designed to A problem is broken Each part is further be run using multiple into discrete parts broken down to a CPUs that can be solved series of instructions concurrently Instructions from each part execute simultaneously on different CPUs
  • 12. PARALLEL COMPUTING: DEFINITIONS Simultaneous use of multiple computing resources to solve a computational problem. Run using multiple CPUs A problem is broken into discrete parts that can be solved concurrently Each part is further broken down to a series of instructions Instructions from each part execute simultaneously on different CPUs
  • 13. PARALLEL COMPUTER CLASSIFICATION : Ol’ serial FLYNN’S TAXONOMY (1966) Graphics computer processors SISD SIMD Single Instruction, Single Instruction, Single Data Multiple Data SOURCE: https://computing.llnl.gov/tutorials/parallel_comp/#Whatis MISD MIMD LAWRENCE LIVERMORE NATIONAL LABORATORY Multiple Instruction, Multiple Instruction, Single Data Multiple Data Rare – Space Most Shuttle Flight modern Computer parallel computers
  • 14. COMPUTATIONAL PROBLEMS IN PARALLEL COMPUTING Perfect for loose grids Embarrassingly parallel calculations: (delays not so important) • each sub-calculation is independent of all the other calculations. Subtasks rarely or never communicate between them. Best for High-throughput computing Fine-grained calculations More suitable for • Each sub-calculation is dependent on the result of supercomputers another sub-calculation. Subtasks communicate many times per second. Best for High-performance computing Coarse-grained calculations More suitable for • Subtasks communicate between them less frequently supercomputers (just several times per second).
  • 15. SIMPLE EXAMPLE – Heat modeling Source: Lawrence Livermore National Laboratory The entire array is Master process sends initial info to partitioned and distributed workers, checks for convergence as subarrays to all tasks. and collects results . Each task owns a portion of the total array. Worker process calculates solution, communicating as necessary with neighbor processes
  • 17. HIGH-THROUGHPUT PROBLEMS Problems divided into many independent tasks Computing grids used to schedule these tasks, dealing them out to the different processors in the grid. As soon as a processor finishes on task, the next task arrives. Example: Large Hadron Collider Computer Grid (CERN / Geneva)
  • 18. APPROACHES FOR PARALLEL COMPUTING IMPLEMENTATION CLUSTER Processors are close together High speed of network, low latency When big: “Supercomputer” Ideal for fine-grained, high performance Computation. Or a mix GRID Disperse – even wide distances Is the most distributed form of parallel computing Internet as main transport Loose connectivity High latency Ideal for embarrasingly parallel, high Throughput computation. Mostly commodity hardware in nodes
  • 19. GRIDS ALL OVER THE WORLD … CERN LHC Computing Grid 200K processors 11 clusters worldwide Source: http://www.accessgrid.org CERN Large Hadron Collider CG currently most important / powerful scientific grid
  • 20. Pioneering Grid Vendors Scientific applications General market : commerce, External or internal grids industry, science Pioneer client software Computing on demand model Sun Grid Engine (middleware) Lustre distributed Filesystem
  • 21. Now…The Cloud meets the Grid… Grid resources become abstractions (“black boxes”)
  • 22. New players in scene… Many more Joining in…
  • 23. CASE STUDY: AMAZON WEB SERVICES EC2 S3 Elastic Compute Simple Storage Cloud Service SimpleDB SQS (unstructured Simple Queue database) Service Elastic Enables parallelization MapReduce
  • 24. PROOF-OF-CONCEPT PROPOSAL: Use AWS cloud infrastructure as a platform for scientific applications oriented, high performance / high throughput, parallel computing.
  • 25. AWS HOW-TO FOR DOING THIS ( “self-service”) • Have embarrasingly parallel problem at hand. • Code problem solution using parallel techniques such as MPI 1 (Message Passing Interface). Use Fortran, C, C++, Python. * HADOOP is an open-source product of the Apache Software Foundation written in Java™ • Create running environment snapshot (full with OS, software) and store in S3. • MAP & REDUCE using AWS HADOOP* middleware 2 implementation. Balance loads. • Feed separate tasks to n EC2 nodes. Start nodes on demand using the S3-stored images. Deploy & Coordinate with HADOOP. 3 • Collect partial results, assemble into final product (master node).
  • 26. Source: http://hadoop.apache.org/core/ MAP & REDUCE ARCHITECTURE Price list for AWS (as of Spring, 2009) http://aws.amazon.com/elasticmapreduce/#pricing Service Cost (using maximum capacity) EC2 $0.80/hr MapReduce $0.12/hr S3 $0.15/GB per month $0.10/GB Data transfer
  • 28. SOME CURRENT ACADEMIC CURRENT PROJECTS & WORKING IMPLEMENTATIONS OF CLOUD COMPUTING-BASED SCIENTIFIC GRIDS (Nimbus Framework) •University of Chicago (NIMBUS) •University of Florida (STRATUS) •University of Purdue (WISPY) •Masarik University (KUPA) (Czech Republic) Source: http://workspace.globus.org/clouds
  • 29. CONCLUSION By integrating networking, computation and information, the Grid provides a practical, virtual platform for computing suitable for scientific research. AWS cloud services make it easy and affordable to implement a sufficiently powerful, scalable, and practical grid computing platform. •Can be self serviced. •On-demand model •Very economic •Suitable for fast-turnkey solutions without the expense of costly infrastructure, computer time. •Ideal in an academic environment, to foster hands-on research with complex models.
  • 30. FUTURE WORK GLOBALLY : Grid computing, now more widely enabled by cloud- computing (Infrastructure-as-a-Service) platforms and the sponsorship of governments, industries and the scientific community, is a fundamental component for the future of computing. LOCALLY: The goal of this research paper is to provide a basis for a near-future practical, proof-of-concept implementation of a Cloud-based Grid that puts Prairie View A&M University in the list of universities having access and use of such infrastructure, for the benefit of its students, academic staff, and the community in general.