Current Trends in HPC

1,787 views

Published on

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,787
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
123
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • CUDA is an architecture with a number of entry points. Today, developers are programming in C for CUDA. Using NVIDIA compilers. Programming language support for Fortran and other languages is coming soon. Also, CUDA supports emerging API programming standards such as OpenCL. Because the OpenCL and CUDA constructs for parallelism are so similar, applications written in C can easily be ported to OpenCL if desired. OpenCL applications sit on top of the CUDA architecture.
  • Not just WSDLs on things, but common abstractions that apply across many resources and services. (A work in progress.)
  • The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media). Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage. More detailed structured data New unstructured data Device-generated data But big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as: Scale out storage MPP database architectures Hadoop and the Hadoop ecosystem In-database analytics In-memory computing Data virtualization Data visualization
  • Content and service providers as well as global organizations that need to distribute large content files are challenged with managing and ensuring performance of these distributed systems. Thus a new approach using a single storage pool in the cloud that provides policies for content placement, multi-tenancy and self service can be beneficial to their business.
  • Current Trends in HPC

    1. 1. Current Trends in High Performance Computing Dr. Putchong Uthayopas Department Head, Department of Computer Engineering, Faculty of Engineering, Kasetsart University Bangkok, Thailand. pu@ku.ac.th
    2. 2. I am pleased to be here!
    3. 3. Introduction• High Performance Computing – An area of computing that involve the hardware and software that help solving large and complex problem fast• Many applications – Science and Engineering research • CFD, Genomics, Automobile Design, Drug discovery – High Performance Business Analysis • Knowledge Discovery • Risk analysis • Stock portfolio management – Business is moving more to the analysis of data from data warehouse
    4. 4. Why we need HPC?• Change in scientific discovery – Experimental to simulation and visualization• Critical need to solve an ever larger problem – Global Climate modeling – Life science – Global warming• Modern business need – Design more complex machinery – More complex electronics design – Complex and large scale financial system analysis – More complex data analysis
    5. 5. Top 500: Fastest Computer on Our Planet• List of the 500 most powerful supercomputers generated twice a year (June and November)• Latest was announced in June 2012
    6. 6. Sequoia @ Lawrence Livermore Lab• BlugeneQ• 34 login node – 48 cpu/node 64GB• 98304 node – 16 cpu/node 16GB• IBM power 7 1,572,864 CPU, 1.6 PB RAM• Peak 20132 TFlops
    7. 7. Performance Development
    8. 8. Projected Performance Development
    9. 9. Top 500: Application Area
    10. 10. Processor Just not running faster• Processor speed keep increasing for the last 20 years• Common technique – Smaller process technology – increase clock speed – Improve microarchitecture • Pentium, Pentium II, Pentium III, Pentium IV, Centrino, Core
    11. 11. Pitfall• Smaller process technology let to denser transistor but…. – Heat dissipation – Noise – reduce voltage• Increase clock speed – More power used since CMOS consume power only when switch• Improve microarchitecture – Small improvement for a lot more complex design• The only solution left is to use concurrency. Doing many things at the same time
    12. 12. Parallel Computing• Speeding up the execution by splitting task into many independent subtask and run them on multiple processors or core – Break large task into many small sub tasks – Execute these sub tasks on multiple core ort processors – Collect result together 14
    13. 13. How to achieve concurrency• Adding more concurrency into hardware • Processor • I/O • Memory• Adding more concurrency into software – How to express parallelism better in software• Adding more concurrency into algorithm – How to do many thing at the same time – How to make people think in parallel
    14. 14. The coming (back) of multicore
    15. 15. Hybrid Architecture Interconnection Network
    16. 16. Rational for Hybrid Architecture• Most scientific application has fine grain parallelism inside – CFD, Financial computation, image processing• Energy efficient – Employing large number of slow processor and parallelism can help lower the power consumption and heat
    17. 17. Two main approaches• Using multithreading and scale down processor that is compatible to conventional processor – Intel MIC• Using very large number of small processors core in a SIMD model. Evolving from graphics technology – NVIDIA GPU – AMD fusion
    18. 18. Many Integrated Core Architecture• Effort by Intel to add a large number of core into a computing system
    19. 19. Multithreading Concept
    20. 20. Challenges• Large number of core will have to divide memory among them – Much smaller memory per core – Demand high memory bandwidth• Still need an effective fine grain parallel programming model• No free lunch , programmer have to do some work
    21. 21. What is GPU Computing? 4 cores Computing with CPU + GPU Heterogeneous Computing
    22. 22. Not 2x or 3x : Speedups are 20x to 150x 146X 36X 18X 50X 100X Medical Molecular Video Matlab AstrophysicImaging Dynamics Transcoding Computing sU of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN Urbana 149X 47X 20X 130X 30X Financial Linear Algebra 3D Quantum Genesimulation Universidad Ultrasound Chemistry Sequencing Oxford Jaime Techniscan U of Illinois, U of Maryland Urbana
    23. 23. CUDA Parallel Computing Architecture• Parallel computing architecture and programming model• Includes a C compiler plus support for OpenCL and DX11 Compute• Architected to natively ATI’s Compute support all computational interfaces “Solution” (standard languages and APIs)
    24. 24. Compiling C for CUDA Applications C CUDA Rest of C Key Kernels Application NVCC CPU Code CUDA object CPU object files files Linker CPU-GPU Executable
    25. 25. Simple “C” Description For Parallelism void saxpy_serial(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } Standard C Code // Invoke serial SAXPY kernel saxpy_serial(n, 2.0, x, y); __global__ void saxpy_parallel(int n, float a, float *x, float *y) { Parallel C Code int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } // Invoke parallel SAXPY kernel with 256 threads/block int nblocks = (n + 255) / 256; saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);
    26. 26. Computational FinanceFinancial Computing Software vendors SciComp : Derivatives pricing modeling Hanweck: Options pricing & risk analysis Aqumin: 3D visualization of market data Exegy: High-volume Tickers & Risk Analysis Source: SciComp QuantCatalyst: Pricing & Hedging Engine Oneye: Algorithmic Trading Arbitragis Trading: Trinomial Options PricingOngoing work LIBOR Monte Carlo market model Callable Swaps and Continuous Time Finance Source: CUDA SDK
    27. 27. Weather, Atmospheric, & Ocean ModelingCUDA-accelerated WRF available Other kernels in WRF being portedOngoing work Tsunami modeling Source: Michalakes, Vachharajani Ocean modeling Several CFD codes Source: Matsuoka, Akiyama, et al
    28. 28. New emerging Standard• OpenCL – Support by many vendor including apple – Target for both GPU based SIMD and multithreading – More complex to program that CUDA• OpenACC – OpenACC is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI – simplify parallel programming of heterogeneous CPU/GPU systems. – Directives based
    29. 29. Cluster computing• The use of large number of server that linked on a high speed local network as one single large supercomputer• Popular way of building supercomputer• Software – Cluster aware OS • Windows compute cluster server 2008 • NPACI Rocks Linux• Programming system such as MPI• Use mostly in computer aided design, engineering, scientific research
    30. 30. Comment• Cluster computing is a very mature discipline• We know how to build a sizable cluster very well – Hardware integration – Storage integration : Luster, GPFS – Scheduler: PBS, Torque, SGE, LSF – Programming MPI – Distribution : ROCKS• Cluster is a foundation fabric for grid and cloud
    31. 31. TERA Cluster 2.5Gbps to Uninet Storage 48 TB• KU Fiber Backbone 1 Frontend (HP ProLiant DL360 G5 (1Gbps Fiber) Server) and 192 1 Gbps Ethernet/Fiber computer nodes – Intel Xeon 3.2 GHz (Dual core, Edge Switch 1Gbps Ethernet Dual processor) – Memory 4 GB (8GB for Frontend & FE FE WinHPC TERA Anatta SPARE1 SPARE2 infiniband Sunyata Araya (FE) (FE) (FE) (FE) (FE) nodes) – 70x4 GB SCSI HDD (RAID1)• 4 Storage Servers 96 nodes – Lustre file 64 + 15 4 nodes 4 nodes nodes 16 spare nodes system for TERA clusters storage nodes – Attached with Smart Array P400i Controller for 5TB space 200 Ports Gigabit Ethernet switch Storage Tier 5TB Lustre FS FS FS FS FS 1 2 3 4 TGCC 2008, Khon Khan University , August 29,2008 Thailand
    32. 32. Grid Computing Technology• Grid computing enables the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities.• Just as an Internet user views a unified instance of content via the Web, a grid user essentially sees a single, large virtual computer.
    33. 33. Grid Architecture• Fabric Layer – Protocol and interface that provide access to computing resources such Application Layer as CPU, storage• Connectivity Layer – Protocol for Grid-specific network Collective Layer transaction such as security GSI• Resources Layer – Protocol to access a single resources from application Resources • GRAM (Grid Resource Allocation Management) • GridFTP ( data access) • Grid Resource Information Service Connectivity• Collective layer – Protocol that manage and access group of resources Fabric
    34. 34. Globus as Service-Oriented Infrastructure User User User Application Application Application Tool Tool Reliable File User Svc Uniform interfaces, Transfer Host Envsecurity mechanisms, MDS-Web service transport, Index MyProxy monitoring DAIS User Svc GRAM GridFTP IBM Host Env IB M IBM IB M Database Specialized Computers Storage resource
    35. 35. Introduction to ThaiGrid• A National Project under Software Industry Promotion Agency (Public Organization) , Ministry of Information and Communication Technology• Started in 2005 from 14 member organizations• Expanded to 22 organizations in 2008 TGCC 2008, Khon Khan University ,August 29,2008 Thailand
    36. 36. Thai Grid Infrastructure 19 sites 1 Gbps About 1000 CPU core. s 1 Gbp 155 M 2.5 Gbps bps 31 bps s M bp 155M 0 Mbps ps 1G 155 ps 310 ps Mb Mb bp Gb 155 s 2 .5 5 15 bps M ps Mb 5 1 5 bps 15 M 5 s bp 1G TGCC 2008, Khon Khan University ,August 29,2008 Thailand
    37. 37. ThaiGrid Usage • ThaiGrid provides about 290 years of computing time for members – 9 years on the grid – 280 years on tera • 41 projects from 8 areas are being support on Teraflop machine • More small projects on each machines TGCC 2008, Khon Khan University ,August 29,2008 Thailand
    38. 38. Medicinal Herb Research• Partner – Cheminormetics Center, Kasetsart Univesity (Chak Sangma and team)• Objective – Using 3D-molecular databse and virtual screening to verify the traditional medicinal herb• Benefit – Scientific proof of the ancient traditional drug – Benefit poor people that still rely on the drug from medicinal herb – Potential benefit for local pharmaceutical industry Virtual Screening Infrastructure Lab Test TGCC 2008, Khon Khan University , August 29,2008 Thailand
    39. 39. NanoGrid Computing Resources Computing Resources2 MS-Gateway 3 1 MS-Gateway ThaiGrid • Objective – Platform that support computational Nano science research • Technology used – AccelRys Materials Studio – Cluster Scheduler: Sun Grid Engine and Torque TGCC 2008, Khon Khan University ,August 29,2008 Thailand
    40. 40. Challenges• Size and Scale• Manageability – Deployment – Configuration – Operation• Software and Hardware Compatibility
    41. 41. Grid System Architecture• Clusters – Satellite Sets • 16 clusters delivered from ThaiGrid for initial members • Composed of 5 nodes of IBM eServer xSeries 336 – Intel Xeon 2.8Ghz (Dual Processor) – x86_64 architecture – Memory: 4 GB (DDR2 SDRAM) – Other sets • Various type of servers and number of nodes • Provided by member institutes of ThaiGrid
    42. 42. Grid as a Super Cluster Grid Scheduler GCC REN H H H C C C C H C C C C C C C C C C C CAugust 29,2008 TGCC 2008, Khon Khan University , Thailand
    43. 43. Is grid still alive?• Yes, grid is a useful technology for certain task – Bit torrent for massive file exchange infrastructure – European Grid is using it to share LHC data• Pit fall of the grid – Network is still not reliable and fast enoughlong term operation – Multi-site , multi- authority concept make it very complex for • system management • Security • User to really use the system• Recent trend is to move to centralized cloud
    44. 44. What is Clouding Computing? Google Saleforce AmazonSource: Wikipedia (cloud computing) Microsoft Yahoo
    45. 45. Why Cloud Computing?• The illusion of infinite computing resources available on demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning.• The elimination of an up-front commitment by Cloud users, thereby allowing companies to start small and increase hardware resources only when there is an increase in their needs.• The ability to pay for use of computing resources on a short-term basis as needed (e.g., processors by the hour and storage by the day) and release them as needed, thereby rewarding conservation by letting machines and storage go when they are no longer useful. Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC Berkeley
    46. 46. Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UCBerkeley
    47. 47. Cloud Computing Explained• Saas (Software as a Services) Application delivered over internet as a services (gmail)• Cloud is a massive server and network that serve Saas to large number of user• Service being sold is called Utility computing Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC Berekeley
    48. 48. Enabling Technology for Cloud Computing• Cluster and Grid Technoogy – The ability to build a highly scalable computing system that consists of 100000 -1000000 nodes• Service oriented Architecture – Everything is a service – Easy to build, distributed, integrate into large scale aplication• Web 2.0 – Powerful and flexible user interface for intenet enable world
    49. 49. Cloud Service Model
    50. 50. Cloud Computing Software Stack
    51. 51. Architecture of Service Oriented Cloud Computing Systems (SOCCS)  SOCCS can be User Interface constructed by combining CCR/DSS Cloud Application Software to form scalable service to a client application. DSS CSM  Cloud Service CCR Management (CSM) acts as a resourcesOperating System Operating System Operating System management system that keeps track of the NodeHardware Node Hardware Node Hardware availability of services on Interconnection Network the cloud. 57
    52. 52. Cloud System Configuration Cloud UserInterface (Excel) Cloud Cloud Service Management Application (CSM) Service Service Service Service OS OS OS OS HW HW HW HW Interconnection network 58
    53. 53. A Proof-of-Concept ApplicationPickup and Delivery Problem with Time Window (PDPTW) is a problem of serving a number of transportation requests based on limited number of vehicles.The objective of the problem is to minimize the sum of the distance traveled by the vehicles and minimize the sum of the time spent by each vehicle. 59
    54. 54. PDPTW on the cloud using SOCCS Master/Worker model is adopted as a framework for service interaction. The algorithm is partitioned using domain decomposition approach. Cloud application control the decomposition of the problem by sending each sub problem to worker service and collect the results back to the best answer. 60
    55. 55. ResultsSpeed up on a single node with 4 cores 61
    56. 56. ResultsPerformance: Speedup and efficiency derived from average runtime on 1, 2, 4, 8 and 16 compute nodes. 62
    57. 57. We are living in the world of Data Video Surveillance Social MediaMobile Sensors Gene Sequencing Smart Grids Geophysical Medical Imaging Exploration
    58. 58. Big Data“Big data is data that exceeds the processing capacity ofconventional database systems. The data is too big,moves too fast, or doesn’t fit the strictures of yourdatabase architectures. To gain value from this data, youmust choose an alternative way to process it.” Reference: “What is big data? An introduction to the big data landscape.”, Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html
    59. 59. The Value of Big Data• Analytical use – Big data analytics can reveal insights hidden previously by data too costly to process. • peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. – Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data.• Enabling new products. – Facebook has been able to craft a highly personalized user experience and create a new kind of advertising business
    60. 60. 3 Characteristics of Big Data
    61. 61. Big Data Challenge• Volume – How to process data so big that can not be move, or store.• Velocity – A lot of data coming very fast so it can not be stored such as Web usage log , Internet, mobile messages. Stream processing is needed to filter unused data or extract some knowledge real-time.• Variety – So many type of unstructured data format making conventional database useless.
    62. 62. How to deal with big data • Integration of – Storage – Processing – Analysis Algorithm – Visualization ProcessingMassive Data Stream Processing VisualizeStream processing Storage Processing Analysis
    63. 63. A New Approach For Distributed Big L.A. Data BOSTON LONDON L.A. BOSTON LONDON Storage Islands Single Storage Pool• Disparate Systems • Single System Across Locations• Manual Administration • Automated Policies• One Tenant, Many Systems • Many Tenants One System• IT Provisioned Storage • Self-Service Access
    64. 64. Hadoop• Hadoop is a platform for distributing computing problems across a number of servers. First developed and released as open source by Yahoo. – Implements the MapReduce approach pioneered by Google in compiling its search indexes. – Distributing a dataset among multiple servers and operating on the data: the “map” stage. The partial results are then recombined: the “reduce” stage.• Hadoop utilizes its own distributed filesystem, HDFS, which makes data available to multiple computing nodes• Hadoop usage pattern involves three stages: – loading data into HDFS, – MapReduce operations, and – retrieving results from HDFS.
    65. 65. WHAT FACEBOOK KNOWS Cameron Marlow calls himself Facebooks "in- house sociologist." He and his team can analyzehttp://www.facebook.com/data essentially all the information the site gathers.
    66. 66. The links of Love• Often young women specify that they are “in a relationship” with their “best friend forever”. – Roughly 20% of all relationships for the 15-and-under crowd are between girls. – This number dips to 15% for 18- year-olds and is just 7% for 25-year- olds.• Anonymous US users who were over 18 at the start of the relationship – the average of the shortest number of steps to get from any one U.S. user to any other individual is 16.7. – This is much higher than the 4.74 steps you’d need to go from any Facebook user to another through friendship, as opposed to romantic, Graph shown the relationship of anonymous US users who were over ties. 18 at the start of the relationship. http://www.facebook.com/notes/facebook-data-team/the-links-of- love/10150572088343859
    67. 67. Why?• Facebook can improve users experience – make useful predictions about users behavior – make better guesses about which ads you might be more or less open to at any given time• Right before Valentines Day this year a blog post from the Data Science Team listed the songs most popular with people who had recently signaled on Facebook that they had entered or left a relationship
    68. 68. Data Tsunami• Data flood is coming, no where to run now! – Data being generated anytime, anywhere, anyone – Data is moving in fast – Data is too big to move, too big to store• Better be prepare – Use this to enhance your business and offer better services to customer
    69. 69. The Opportunities and Challenges of Exascale Computing• Summary of findings from many workshop in US.• List issues needed to overcome• We will present only some challenges
    70. 70. Hardware Challenges• Major improvement in hardware is needed.
    71. 71. Power Challenge• Power consumption of the computers is the largest hardware research challenge.• Today, power costs for the largest petaflop systems are in the range of $5-10M60 annually• An exascale system using current technology. – the annual power cost to operate the system would be above $2.5B per year. – The power load would be over a gigawatt• The target of 20 megawatts, identified in the DOE Technology Roadmap, is primarily based on keeping the operational cost of the system in some kind of feasible range.
    72. 72. Memory Challenge• Memory subsystem is too slow
    73. 73. Data Movement Challenge
    74. 74. System Resiliency Challenge• For exascale systems, the number of system components will be increasing faster than component reliability, with projections in the minutes or seconds for exascale systems.• Exascale systems will experience various kind of faults many times per day. – Systems running 100 million cores will continually see core failures and the tools for• Dealing with them will have to be rethought.
    75. 75. “Co-Design” Challenge
    76. 76. The Computer Science Challenges• A programming model effort is a critical component – clock speeds will be flat or even dropping to save energy. All performance improvements within a chip will come from increased parallelism. The amount of memory per arithmetic – need for fine-grained parallelism and a programming model other than message passing or coarse-grained threads
    77. 77. Under the radar• Mobile processor run super computer• Hybrid war! GPU VS. MIC• I/O goes solid state• Programming standard war – CUDA/ OpenCL/ OpenMP/ OpenACC
    78. 78. Summary• We are in the challenging world• Demand for HPC system, application will increase. – Software tool , technology, hardware is changing to catch up.• The greatest challenge is how to quickly develop software for the next generation computing system
    79. 79. THANK YOU

    ×