Despite Moore's "law", uniprocessor clock speeds have now stalled. Rather than single processors running at ever higher clock speeds, it is
common to find dual-, quad- or even hexa-core processors, even in consumer laptops and desktops.
Future hardware will not be slightly parallel, however, as in today's multicore systems, but will be
massively parallel, with manycore and perhaps even megacore systems
becoming mainstream.
This means that programmers need to start thinking parallel. To achieve this they must move away
from traditional programming models where parallelism is a
bolted-on afterthought. Rather, programmers must use languages where parallelism is deeply embedded into the programming model
from the outset.
By providing a high level model of computation, without explicit ordering of computations,
declarative languages in general, and functional languages in particular, offer many advantages for parallel
programming.
One of the most fundamental advantages of the functional paradigm is purity.
In a purely functional language, as exemplified by Haskell, there are simply no side effects: it is therefore impossible for parallel computations to conflict with each
other in ways that are not well understood.
ParaForming aims to radically improve the process
of parallelising purely functional programs through a comprehensive set of high-level parallel refactoring patterns for Parallel Haskell,
supported by advanced refactoring tools.
By matching parallel design patterns with appropriate algorithmic skeletons
using advanced software refactoring techniques and novel cost information, we will bridge the gap between fully automatic
and fully explicit approaches to parallelisation, helping programmers "think parallel" in a systematic,
guided way. This talk introduces the ParaForming approach, gives some examples and shows how
effective parallel programs can be developed using advanced refactoring technology.
Exploring emerging technologies in the HPC co-design spacejsvetter
This document discusses emerging technologies for high performance computing (HPC), focusing on heterogeneous computing and non-volatile memory. It provides an overview of HPC architectures past and present, highlighting the trend toward more heterogeneous systems using GPUs and other accelerators. The document discusses challenges for applications to adapt to these changing architectures. It also explores potential future technologies like 3D memory and discusses the Department of Energy's efforts in codesign centers to facilitate collaboration between application developers and emerging hardware.
Deploying deep learning models with Docker and KubernetesPetteriTeikariPhD
Short introduction for platform agnostic production deployment with some medical examples.
Alternative download: https://www.dropbox.com/s/qlml5k5h113trat/deep_cloudArchitecture.pdf?dl=0
Hardware Acceleration for Machine LearningCastLabKAIST
This document provides an overview of a lecture on hardware acceleration for machine learning. The lecture will cover deep neural network models like convolutional neural networks and recurrent neural networks. It will also discuss various hardware accelerators developed for machine learning, including those designed for mobile/edge and cloud computing environments. The instructor's background and the agenda topics are also outlined.
The document discusses parallel computing and multicore processors. It notes that Berkeley researchers believe multicore is the future of computing. It also discusses building an academic "manycore" research system using FPGAs to allow researchers to experiment with parallel algorithms, compilers, and programming models on thousands of processor cores. This would help drive innovation and avoid long waits between hardware and software iterations.
In this deck from the 2016 HPC Advisory Council Switzerland Conference, DK Panda from Ohio State University presents: High-Performance and Scalable Designs of Programming Models for Exascale Systems.
"This talk will focus on challenges in designing runtime environments for Exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPUs and Intel MIC) and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video presentation: http://wp.me/p3RLHQ-f7c
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Models for Parallel, Concurrent and Distributed Processing for Bioinformatics Software
Novartis Institute for BioMedical Research (NIBR) Geek Speak - Dec 4, 2014
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
This document summarizes early benchmarking results for neuromorphic computing using Intel's Loihi chip. It finds that Loihi provides orders of magnitude gains over CPUs and GPUs for certain workloads that are directly trained on the chip or use novel bio-inspired algorithms. These include online learning, adaptive control, event-based vision and tactile sensing, constraint satisfaction problems, and nearest neighbor search. Larger networks and problems tend to provide greater performance gains with Loihi.
"Session ID: BUD17-503
Session Name: The HPE Machine and Gen-Z - BUD17-503
Speaker:
Grant Likely
Track:
★ Session Summary ★
With the exponential rise in quantity of data to manage, the modern data centre is increasingly limited by the capacity of individual machines. Since storage and compute demand more capacity than can be provided by a single machine, we distribute both over large clusters and use the network to transfer data between where it is stored and where it is processed. Moving all that data around uses deep storage stacks which incur a significant performance impact. If we could somehow flatten the storage stack and provide applications with direct access to data, then we could improve performance by orders of magnitude.
Hewlett Packard Enterprise recently demonstrated that we can do exactly with their research project, ""The Machine"". Instead of moving data around with a network, The Machine uses multi terabytes of persistent memory and a next generation fabric-attached memory interconnect to provide a single pool of storage which can be accessed by any processor in the cluster. It shows that we can provide applications with immediate load/store access to huge data sets in a model called Memory-Driven Computing.
Proof in hand, now it is time to bring Memory-Defined Computing to the data centre. Gen-Z is an open systems interconnect designed to provide memory semantic access to data and devices via direct attached, switched or fabric topologies. HPE has joined the Gen-Z consortium and is using the knowledge gained with The Machine to help shape Gen-Z to set the stage for true Memory-Driven Computing. With putting memory at the centre, this enables us to overcome the limitations of today's computing systems and power innovations.
This session will cover two topics. It will start with a status update on The Machine and an overview of how it works. Then we'll shift into an introduction of Gen-Z, and how it can reshape the architecture of computing in the years to come.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-503/
Presentation:
Video: https://youtu.be/1BVtChDQVyQ
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
Keyword: HPE, Gen-Z
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
Exploring emerging technologies in the HPC co-design spacejsvetter
This document discusses emerging technologies for high performance computing (HPC), focusing on heterogeneous computing and non-volatile memory. It provides an overview of HPC architectures past and present, highlighting the trend toward more heterogeneous systems using GPUs and other accelerators. The document discusses challenges for applications to adapt to these changing architectures. It also explores potential future technologies like 3D memory and discusses the Department of Energy's efforts in codesign centers to facilitate collaboration between application developers and emerging hardware.
Deploying deep learning models with Docker and KubernetesPetteriTeikariPhD
Short introduction for platform agnostic production deployment with some medical examples.
Alternative download: https://www.dropbox.com/s/qlml5k5h113trat/deep_cloudArchitecture.pdf?dl=0
Hardware Acceleration for Machine LearningCastLabKAIST
This document provides an overview of a lecture on hardware acceleration for machine learning. The lecture will cover deep neural network models like convolutional neural networks and recurrent neural networks. It will also discuss various hardware accelerators developed for machine learning, including those designed for mobile/edge and cloud computing environments. The instructor's background and the agenda topics are also outlined.
The document discusses parallel computing and multicore processors. It notes that Berkeley researchers believe multicore is the future of computing. It also discusses building an academic "manycore" research system using FPGAs to allow researchers to experiment with parallel algorithms, compilers, and programming models on thousands of processor cores. This would help drive innovation and avoid long waits between hardware and software iterations.
In this deck from the 2016 HPC Advisory Council Switzerland Conference, DK Panda from Ohio State University presents: High-Performance and Scalable Designs of Programming Models for Exascale Systems.
"This talk will focus on challenges in designing runtime environments for Exascale systems with millions of processors and accelerators to support various programming models. We will focus on MPI, PGAS (OpenSHMEM, CAF, UPC and UPC++) and Hybrid MPI+PGAS programming models by taking into account support for multi-core, high-performance networks, accelerators (GPUs and Intel MIC) and energy-awareness. Features and sample performance numbers from the MVAPICH2 libraries will be presented."
Watch the video presentation: http://wp.me/p3RLHQ-f7c
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Models for Parallel, Concurrent and Distributed Processing for Bioinformatics Software
Novartis Institute for BioMedical Research (NIBR) Geek Speak - Dec 4, 2014
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
This document summarizes early benchmarking results for neuromorphic computing using Intel's Loihi chip. It finds that Loihi provides orders of magnitude gains over CPUs and GPUs for certain workloads that are directly trained on the chip or use novel bio-inspired algorithms. These include online learning, adaptive control, event-based vision and tactile sensing, constraint satisfaction problems, and nearest neighbor search. Larger networks and problems tend to provide greater performance gains with Loihi.
"Session ID: BUD17-503
Session Name: The HPE Machine and Gen-Z - BUD17-503
Speaker:
Grant Likely
Track:
★ Session Summary ★
With the exponential rise in quantity of data to manage, the modern data centre is increasingly limited by the capacity of individual machines. Since storage and compute demand more capacity than can be provided by a single machine, we distribute both over large clusters and use the network to transfer data between where it is stored and where it is processed. Moving all that data around uses deep storage stacks which incur a significant performance impact. If we could somehow flatten the storage stack and provide applications with direct access to data, then we could improve performance by orders of magnitude.
Hewlett Packard Enterprise recently demonstrated that we can do exactly with their research project, ""The Machine"". Instead of moving data around with a network, The Machine uses multi terabytes of persistent memory and a next generation fabric-attached memory interconnect to provide a single pool of storage which can be accessed by any processor in the cluster. It shows that we can provide applications with immediate load/store access to huge data sets in a model called Memory-Driven Computing.
Proof in hand, now it is time to bring Memory-Defined Computing to the data centre. Gen-Z is an open systems interconnect designed to provide memory semantic access to data and devices via direct attached, switched or fabric topologies. HPE has joined the Gen-Z consortium and is using the knowledge gained with The Machine to help shape Gen-Z to set the stage for true Memory-Driven Computing. With putting memory at the centre, this enables us to overcome the limitations of today's computing systems and power innovations.
This session will cover two topics. It will start with a status update on The Machine and an overview of how it works. Then we'll shift into an introduction of Gen-Z, and how it can reshape the architecture of computing in the years to come.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-503/
Presentation:
Video: https://youtu.be/1BVtChDQVyQ
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
Keyword: HPE, Gen-Z
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
1. The document provides information about transformers, including definitions, parts, principles of operation, and applications. It contains questions and answers on topics like transformer ratings, losses, efficiency, and classifications.
2. Transformers work on the principle of mutual induction to transform voltage from one alternating current circuit to another without changing frequency. The main parts are the primary and secondary windings and the magnetic core.
3. Transformers are used in power generation, transmission and distribution to step up or step down voltages for efficient transmission or use by customers.
The document provides an itinerary for a vacation to Wyoming, Iowa, and New Hampshire. In Wyoming, activities include visiting the Million Dollar Cowboy Bar and taking a boat tour on Jenny Lake near Moose, Wyoming. In Iowa, destinations are the Des Moines Botanical Garden, Lost Island Waterpark, and various barbecue restaurants. The trip concludes in New Hampshire with activities on Lake Winnipesaukee, hiking in the Lost River Gorge and Franconia State Park, and a final day at Water Country water park.
Malaria is transmitted to humans through the bites of infected female Anopheles mosquitoes. The parasite has a complex life cycle involving the liver and red blood cells. Clinical manifestations include fever and flu-like symptoms. Diagnosis is usually by blood smear microscopy. Treatment involves antimalarial drug combinations to prevent resistance. Prevention focuses on insecticide-treated bed nets, repellents, and reducing mosquito habitats. Accurate diagnosis is challenging in resource-poor areas where malaria symptoms overlap with other fevers like pneumonia. This can lead to overdiagnosis of malaria and underdiagnosis of other conditions.
The document discusses parallel programming approaches for multicore processors, advocating for using Haskell and embracing diverse approaches like task parallelism with explicit threads, semi-implicit parallelism by evaluating pure functions in parallel, and data parallelism. It argues that functional programming is well-suited for parallel programming due to its avoidance of side effects and mutable state, but that different problems require different solutions and no single approach is a silver bullet.
Simon Peyton Jones: Managing parallelismSkills Matter
If you want to program a parallel computer, it obviously makes sense to start with a computational paradigm in which parallelism is the default (ie functional programming), rather than one in which computation is based on sequential flow of control (the imperative paradigm). And yet, and yet ... functional programmers have been singing this tune since the 1980s, but do not yet rule the world. In this talk I’ll say why I think parallelism is too complex a beast to be slain at one blow, and how we are going to be driven, willy-nilly, towards a world in which side effects are much more tightly controlled than now. I’ll sketch a whole range of ways of writing parallel program in a functional paradigm (implicit parallelism, transactional memory, data parallelism, DSLs for GPUs, distributed processes, etc, etc), illustrating with examples from the rapidly moving Haskell community, and identifying some of the challenges we need to tackle.
The document discusses the development of the Parallella computing board, which was created to address challenges with parallel computing. It was launched in 2012 with the goal of making parallel computing more accessible through an open source hardware and software platform costing $99. The document outlines Parallella's architecture and collaborations with universities. It argues that parallel programming is difficult but necessary for the future, and that open collaboration is needed to train developers and create parallel algorithms and software stacks.
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Docker, Inc.
The document discusses how modern hardware has become more complex with multi-core, multi-socket CPUs and deep cache hierarchies. This complexity introduces latency and performance issues for software. The author describes their service that processes millions of requests per second spending a large amount of time on garbage collection, context switching, and CPU stalls. They developed a tool called Tesson that analyzes hardware topology and shards containerized applications across CPU cores, pinning linked components closer together to improve locality and performance. Tesson integrates with a local load balancer to distribute workloads efficiently utilizing the system resources.
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
This document provides a summary of large scale machine learning frameworks. It discusses out-of-core learning, data parallelism using MapReduce, graph parallel frameworks like Pregel, and model parallelism using parameter servers. Spark is described as easy to use with a well-designed API, while GraphLab is designed for ML researchers with vertex programming. Parameter servers are presented as aiming to support very large learning but still being in early development.
1. Building exascale computers requires moving to sub-nanometer scales and steering individual electrons to solve problems more efficiently.
2. Moving data is a major challenge, as moving data off-chip uses 200x more energy than computing with it on-chip.
3. Future computers should optimize for data movement at all levels, from system design to microarchitecture, to minimize energy usage.
This document discusses challenges and opportunities in parallel graph processing for big data. It describes how graphs are ubiquitous but processing large graphs at scale is difficult due to their huge size, complex correlations between data entities, and skewed distributions. Current computation models have problems with ghost vertices, too much interaction between partitions, and lack of support for iterative graph algorithms. New frameworks are needed to handle these graphs in a scalable way with low memory usage and balanced computation and communication.
The document introduces Parallel Pixie Dust (PPD), a cross-platform thread library that aims to guarantee deadlock-free and race-condition free schedules that are optimal. It discusses the need for multiple threads due to factors like the memory wall. Current threading models are problematic because testing and debugging threaded code is difficult. PPD uses futures and thread pools to simulate data flow and generate tree-like thread schedules. It provides parallel versions of functions and thread-safe containers to enable multi-threaded standard library algorithms. The goal is to make writing correct multi-threaded programs easier.
- The document discusses building a predictive anomaly detection model for network traffic using streaming data technologies.
- It proposes using Apache Kafka to ingest and process network packet and Netflow data in real-time, and Akka clustering to build predictive models that can guide human cybersecurity experts.
- The solution aims to more effectively guide human awareness of network threats by complementing localized rule-matching with predictive modeling of aggregate network behavior based on streaming metrics.
Historically, sharing a Linux server entailed all kinds of untenable compromises. In addition to the security concerns, there was simply no good way to keep one application from hogging resources and messing with the others. The classic “noisy neighbor” problem made shared systems the bargain-basement slums of the Internet, suitable only for small or throwaway projects.
Serious use-cases traditionally demanded dedicated systems. Over the past decade virtualization (in conjunction with Moore’s law) has democratized the availability of what amount to dedicated systems, and the result is hundreds of thousands of websites and applications deployed into VPS or cloud instances. It’s a step in the right direction, but still has glaring flaws.
Most of these websites are just piles of code sitting on a server somewhere. How did that code got there? How can it can be scaled? Secured? Maintained? It’s anybody’s guess. There simply isn’t enough SysAdmin talent in the world to meet the demands of managing all these apps with anything close to best practices without a better model.
Containers are a whole new ballgame. Unlike VMs, you skip the overhead of running an entire OS for every application environment. There’s also no need to provision a whole new machine to have a place to deploy, meaning you can spin up or scale your application with orders of magnitude more speed and accuracy.
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
Openstack is an open-source cloud computing platform that is widely used. It allows for the provisioning of computing, storage, and networking resources on demand in a manner similar to public cloud services like Amazon Web Services. The presentation discusses Openstack's architecture, current uses, development status, and relationship to high-performance computing. It also covers how Argonne National Labs uses Openstack and potential future directions, like more native support for HPC workloads and integrated application platforms.
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
1. The document provides information about transformers, including definitions, parts, principles of operation, and applications. It contains questions and answers on topics like transformer ratings, losses, efficiency, and classifications.
2. Transformers work on the principle of mutual induction to transform voltage from one alternating current circuit to another without changing frequency. The main parts are the primary and secondary windings and the magnetic core.
3. Transformers are used in power generation, transmission and distribution to step up or step down voltages for efficient transmission or use by customers.
The document provides an itinerary for a vacation to Wyoming, Iowa, and New Hampshire. In Wyoming, activities include visiting the Million Dollar Cowboy Bar and taking a boat tour on Jenny Lake near Moose, Wyoming. In Iowa, destinations are the Des Moines Botanical Garden, Lost Island Waterpark, and various barbecue restaurants. The trip concludes in New Hampshire with activities on Lake Winnipesaukee, hiking in the Lost River Gorge and Franconia State Park, and a final day at Water Country water park.
Malaria is transmitted to humans through the bites of infected female Anopheles mosquitoes. The parasite has a complex life cycle involving the liver and red blood cells. Clinical manifestations include fever and flu-like symptoms. Diagnosis is usually by blood smear microscopy. Treatment involves antimalarial drug combinations to prevent resistance. Prevention focuses on insecticide-treated bed nets, repellents, and reducing mosquito habitats. Accurate diagnosis is challenging in resource-poor areas where malaria symptoms overlap with other fevers like pneumonia. This can lead to overdiagnosis of malaria and underdiagnosis of other conditions.
The document discusses parallel programming approaches for multicore processors, advocating for using Haskell and embracing diverse approaches like task parallelism with explicit threads, semi-implicit parallelism by evaluating pure functions in parallel, and data parallelism. It argues that functional programming is well-suited for parallel programming due to its avoidance of side effects and mutable state, but that different problems require different solutions and no single approach is a silver bullet.
Simon Peyton Jones: Managing parallelismSkills Matter
If you want to program a parallel computer, it obviously makes sense to start with a computational paradigm in which parallelism is the default (ie functional programming), rather than one in which computation is based on sequential flow of control (the imperative paradigm). And yet, and yet ... functional programmers have been singing this tune since the 1980s, but do not yet rule the world. In this talk I’ll say why I think parallelism is too complex a beast to be slain at one blow, and how we are going to be driven, willy-nilly, towards a world in which side effects are much more tightly controlled than now. I’ll sketch a whole range of ways of writing parallel program in a functional paradigm (implicit parallelism, transactional memory, data parallelism, DSLs for GPUs, distributed processes, etc, etc), illustrating with examples from the rapidly moving Haskell community, and identifying some of the challenges we need to tackle.
The document discusses the development of the Parallella computing board, which was created to address challenges with parallel computing. It was launched in 2012 with the goal of making parallel computing more accessible through an open source hardware and software platform costing $99. The document outlines Parallella's architecture and collaborations with universities. It argues that parallel programming is difficult but necessary for the future, and that open collaboration is needed to train developers and create parallel algorithms and software stacks.
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Docker, Inc.
The document discusses how modern hardware has become more complex with multi-core, multi-socket CPUs and deep cache hierarchies. This complexity introduces latency and performance issues for software. The author describes their service that processes millions of requests per second spending a large amount of time on garbage collection, context switching, and CPU stalls. They developed a tool called Tesson that analyzes hardware topology and shards containerized applications across CPU cores, pinning linked components closer together to improve locality and performance. Tesson integrates with a local load balancer to distribute workloads efficiently utilizing the system resources.
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
This document provides a summary of large scale machine learning frameworks. It discusses out-of-core learning, data parallelism using MapReduce, graph parallel frameworks like Pregel, and model parallelism using parameter servers. Spark is described as easy to use with a well-designed API, while GraphLab is designed for ML researchers with vertex programming. Parameter servers are presented as aiming to support very large learning but still being in early development.
1. Building exascale computers requires moving to sub-nanometer scales and steering individual electrons to solve problems more efficiently.
2. Moving data is a major challenge, as moving data off-chip uses 200x more energy than computing with it on-chip.
3. Future computers should optimize for data movement at all levels, from system design to microarchitecture, to minimize energy usage.
This document discusses challenges and opportunities in parallel graph processing for big data. It describes how graphs are ubiquitous but processing large graphs at scale is difficult due to their huge size, complex correlations between data entities, and skewed distributions. Current computation models have problems with ghost vertices, too much interaction between partitions, and lack of support for iterative graph algorithms. New frameworks are needed to handle these graphs in a scalable way with low memory usage and balanced computation and communication.
The document introduces Parallel Pixie Dust (PPD), a cross-platform thread library that aims to guarantee deadlock-free and race-condition free schedules that are optimal. It discusses the need for multiple threads due to factors like the memory wall. Current threading models are problematic because testing and debugging threaded code is difficult. PPD uses futures and thread pools to simulate data flow and generate tree-like thread schedules. It provides parallel versions of functions and thread-safe containers to enable multi-threaded standard library algorithms. The goal is to make writing correct multi-threaded programs easier.
- The document discusses building a predictive anomaly detection model for network traffic using streaming data technologies.
- It proposes using Apache Kafka to ingest and process network packet and Netflow data in real-time, and Akka clustering to build predictive models that can guide human cybersecurity experts.
- The solution aims to more effectively guide human awareness of network threats by complementing localized rule-matching with predictive modeling of aggregate network behavior based on streaming metrics.
Historically, sharing a Linux server entailed all kinds of untenable compromises. In addition to the security concerns, there was simply no good way to keep one application from hogging resources and messing with the others. The classic “noisy neighbor” problem made shared systems the bargain-basement slums of the Internet, suitable only for small or throwaway projects.
Serious use-cases traditionally demanded dedicated systems. Over the past decade virtualization (in conjunction with Moore’s law) has democratized the availability of what amount to dedicated systems, and the result is hundreds of thousands of websites and applications deployed into VPS or cloud instances. It’s a step in the right direction, but still has glaring flaws.
Most of these websites are just piles of code sitting on a server somewhere. How did that code got there? How can it can be scaled? Secured? Maintained? It’s anybody’s guess. There simply isn’t enough SysAdmin talent in the world to meet the demands of managing all these apps with anything close to best practices without a better model.
Containers are a whole new ballgame. Unlike VMs, you skip the overhead of running an entire OS for every application environment. There’s also no need to provision a whole new machine to have a place to deploy, meaning you can spin up or scale your application with orders of magnitude more speed and accuracy.
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
Openstack is an open-source cloud computing platform that is widely used. It allows for the provisioning of computing, storage, and networking resources on demand in a manner similar to public cloud services like Amazon Web Services. The presentation discusses Openstack's architecture, current uses, development status, and relationship to high-performance computing. It also covers how Argonne National Labs uses Openstack and potential future directions, like more native support for HPC workloads and integrated application platforms.
We leave in the era where the atomic building elements of silicon computers, e.g., transistors and wires, are no longer visible using traditional optical microscopes and their sizes are measured in just tens of Angstroms. In addition, power dissipation per unit volume is bounded by the laws of Physics that all resulted among others in stagnating processor clock frequencies. Adding more and more processor cores that perform simpler and simpler tasks in an attempt to efficiently fill the available on-chip area seems to be the current trend taken by the Industry.
The Berkeley View on the Parallel Computing Landscapeugur candan
This document discusses the need for a new approach to hardware and software for parallel computing. It argues that the conventional wisdom about uniprocessor architectures is outdated given physical limits. A group of researchers from UC Berkeley met to discuss parallelism and developed 7 questions to frame research. They identified 13 computational patterns or "dwarfs" that will be important for the next decade. The document discusses building hardware from small cores, connecting processors, the need for a new human-centric programming model, and testing models with human subjects. It calls for innovations in both hardware and software to address the challenges of parallel computing.
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan
The document discusses opportunities for improving Apache Spark performance using near data computing architectures. It proposes exploiting in-storage processing and 2D integrated processing-in-memory to reduce data movement between CPUs and memory. Certain Spark workloads like joins and aggregations that are I/O bound would benefit from in-storage processing, while iterative workloads are more suited for 2D integrated processing-in-memory. The document outlines a system design using FPGAs to emulate these architectures for evaluating Spark machine learning workloads like k-means clustering.
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Spark Summit
Scale-out big data processing frameworks like Apache Spark have been designed to use on off the shelf commodity machines where each machine has the modest amount of compute , memory and storage capacity. Recent advancement in the hardware technology motivates understanding Spark performance on novel hardware architectures. Our earlier work has shown that the performance of Spark based data analytics is bounded by the frequent accesses to the DRAM. In this talk, we argue in favor of Near Data Computing Architectures that enable processing the data where it resides (e.g Smart SSDs and Compute Memories) for Apache Spark. We envision a programmable logic based hybrid near-memory and near-storage compute architecture for Apache Spark. Furthermore we discuss the challenges involved to achieve 10x performance gain for Apache Spark on NDC architectures.
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
The OpenEBS project has taken a different approach to storage when it comes to containers. Instead of using existing storage systems and making them work with containers; what if you were to redesign something from scratch using the same paradigms used in the container world? This resulted in the effort of containerizing the storage controller. Also, as applications that consume storage are changing over, do we need a scale-out distributed storage systems?
Similar to ParaForming - Patterns and Refactoring for Parallel Programming (20)
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
ParaForming - Patterns and Refactoring for Parallel Programming
1. Paraforming:
Forming
Parallel
(Func2onal)
Programs
from
High-‐Level
Pa:erns
using
Advanced
Refactoring
Kevin
Hammond,
Chris
Brown,
Vladimir
Janjic
University
of
St
Andrews,
Scotland
Build
Stuff,
Vilnius,
Lithuania,
December
10
2013
T:
@paraphrase_fp7,
@khstandrews
E:
kh@cs.st-‐andrews.ac.uk
W: http://www.paraphrase-ict.eu!
4. What
will
“megacore”
computers
look
like?
§ Probably
not
just
scaled
versions
of
today’s
mul2core
§
§
§
§
§
§
Perhaps
hundreds
of
dedicated
lightweight
integer
units
Hundreds
of
floa9ng
point
units
(enhanced
GPU
designs)
A
few
heavyweight
general-‐purpose
cores
Some
specialised
units
for
graphics,
authen9ca9on,
network
etc
possibly
so*
cores
(FPGAs
etc)
Highly
heterogeneous
6
5. What
will
“megacore”
computers
look
like?
§ Probably
not
uniform
shared
memory
§ NUMA
is
likely,
even
hardware
distributed
shared
memory
§ or
even
message-‐passing
systems
on
a
chip
§ shared-‐memory
will
not
be
a
good
abstrac:on
int
arr
[x]
[y];
7
6. Laki (NEC Nehalem Cluster) and hermit (XE6)
Laki
hermit (phase 1 step 1)
700 dual socket Xeon 5560 2,8GHz
(“Gainestown”)
4/6
Nodes with 32GB and 64GB memory
reflecting different user needs
2.7PB storage capacity @ 150GB/s IO
bandwidth
I
Scientific Linux 6.0
Each compute node will have 2 sockets
AMD Interlagos @ 2.3GHz 16 Cores
each leading to 113.664 cores
External Access Nodes, Pre-
Postprocessing Nodes, Remote
Visualization Nodes
32 nodes with additional Nvidia Tesla
S1070
I
96 service nodes and 3552 compute
nodes
I
Infiniband (QDR)
I
I
12 GB DDR3 RAM / node
I
38 racks with 96 nodes each
I
I
I
I
I
::
HLRS in ParaPhrase
::
Turin, 4th/5th October 2011
::
8
7. The
Biggest
Computer
in
the
World
Tianhe-‐2,
Chinese
Na2onal
University
of
Defence
Technology
33.86
petaflops/s
(June
17,
2013)
16,000
Nodes;
each
with
2
Ivy
Bridge
mul9cores
and
3
Xeon
Phis
3,120,000
x86
cores
in
total!!!
9
8. It’s
not
just
about
large
systems
• Even
mobile
phones
are
mul9core
§ Samsung
Exynos
5
Octa
has
8
cores,
4
of
which
are
“dark”
• Performance/energy
tradeoffs
mean
systems
will
be
increasingly
parallel
• If
we
don’t
solve
the
mul9core
challenge,
then
no
other
advances
will
maber!
ALL
Future
Programming
will
be
Parallel!
10
9. The
Manycore
Challenge
“Ul9mately,
developers
should
start
thinking
about
tens,
hundreds,
and
thousands
of
cores
now
in
their
algorithmic
development
and
deployment
pipeline.”
Anwar
Ghuloum,
Principal
Engineer,
Intel
Microprocessor
Technology
Lab
The
ONLY
important
challenge
in
Computer
Science
(Intel)
“The
dilemma
is
that
a
large
percentage
of
mission-‐cri9cal
enterprise
applica9ons
will
not
``automagically''
run
faster
on
mul9-‐core
servers.
In
fact,
many
will
actually
run
slower.
We
must
make
it
as
easy
as
possible
for
applica9ons
programmers
to
exploit
the
latest
developments
in
mul9-‐core/many-‐core
Also
recognised
as
thema9c
priori9es
by
EU
and
architectures,
while
s9ll
making
it
easy
to
target
future
(and
perhaps
na9onal
funding
bodies
unan9cipated)
hardware
developments.”
Patrick
Leonard,
Vice
President
for
Product
Development
Rogue
Wave
Sobware
10. But
Doesn’t
that
mean
millions
of
threads
on
a
megacore
machine??
13
11. How
to
build
a
wall
(with
apologies
to
Ian
Watson,
Univ.
Manchester)
13. How
NOT
to
build
a
wall
Typical
CONCURRENCY
Approaches
require
the
Programmer
to
solve
these
Task
iden2fica2on
is
not
the
only
problem…
Must
also
consider
Coordina9on,
communica9on,
placement,
scheduling,
…
14. We
need
structure
We
need
abstrac2on
We
don’t
need
another
brick
in
the
wall
17
15. Thinking
Parallel
§ Fundamentally,
programmers
must
learn
to
“think
parallel”
§ this
requires
new
high-‐level
programming
constructs
§ perhaps
dealing
with
hundreds
of
millions
of
threads
§ You
cannot
program
effec2vely
while
worrying
about
deadlocks
etc.
§ they
must
be
eliminated
from
the
design!
§ You
cannot
program
effec2vely
while
fiddling
with
communica2on
etc.
§ this
needs
to
be
packaged/abstracted!
§ You
cannot
program
effec2vely
without
performance
informa2on
§ this
needs
to
be
included
as
part
of
the
design!
18
16. A
Solu2on?
“The
only
thing
that
works
for
parallelism
is
func2onal
programming”
Bob
Harper,
Carnegie
Mellon
University
17. Parallel
Func2onal
Programming
§ No
explicit
ordering
of
expressions
§ Purity
means
no
side-‐effects
§ Impossible
for
parallel
processes
to
interfere
with
each
other
§ Can
debug
sequen2ally
but
run
in
parallel
§ Enormous
saving
in
effort
§ Programmer
concentrate
on
solving
the
problem
§ Not
por9ng
a
sequen9al
algorithm
into
a
(ill-‐defined)
parallel
domain
§ No
locks,
deadlocks
or
race
condi2ons!!
§ Huge
produc2vity
gains!
λ
λ
λ
18. ParaPhrase
Project:
Parallel
Pa:erns
for
Heterogeneous
Mul2core
Systems
(ICT-‐288570),
2011-‐2014,
€4.2M
budget
13
Partners,
8
European
countries
UK,
Italy,
Germany,
Austria,
Ireland,
Hungary,
Poland,
Israel
Coordinated
by
Kevin
Hammond
St
Andrews
0
19. The
ParaPhrase
Approach
§ Start
bobom-‐up
§ iden9fy
(strongly
hygienic)
COMPONENTS
§ using
semi-‐automated
refactoring
both
legacy
and
new
programs
§ Think
about
the
PATTERN
of
parallelism
§ e.g.
map(reduce),
task
farm,
parallel
search,
parallel
comple9on,
...
§ STRUCTURE
the
components
into
a
parallel
program
§ turn
the
pa?erns
into
concrete
(skeleton)
code
§ Take
performance,
energy
etc.
into
account
(mul9-‐objec9ve
op9misa9on)
§ also
using
refactoring
§ RESTRUCTURE
if
necessary!
(also
using
refactoring)
25
20. Some
Common
Pa:erns
§ High-‐level
abstract
paberns
of
common
parallel
algorithms
Google
map-‐
reduce
combines
two
of
these!
Generally,
we
need
to
nest/combine
paberns
in
arbitray
ways
35
21. The
Skel
Library
for
Erlang
§ Skeletons
implement
specific
parallel
paberns
§ Pluggable
templates
§ Skel
is
a
new
(AND
ONLY!)
Skeleton
library
in
Erlang
§ map,
farm,
reduce,
pipeline,
feedback
§ instan9ated
using
skel:run
§ Fully
Nestable
chrisb.host.cs.st-‐andrews.ac.uk/skel.html
hbps://github.com/ParaPhrase/skel
§ A
DSL
for
parallelism
!
OutputItems = skel:run(Skeleton, InputItems).!
!
36
22. e
Parallel
Pipeline
Skeleton
§ Each
stage
of
the
pipeline
can
be
executed
in
parallel
§ The
input
and
output
are
streams
{pipe, [Skel1 , Skel2 , · · · , Skeln ]}
Tn · · · T1
Skel1
Skel2
···
Skeln
Tn · · · T1
skel:run([{pipe,[Skel1, Skel2,..,SkelN]}], Inputs).!
Inc
= { seq , fun ( X ) - X +1 end } ,
!
Double = { seq , fun ( X ) - X *2 end } ,
skel : run ( { pipe , [ Inc , Double ] } ,
[ 1 ,2 ,3 ,4 ,5 ,6 ] ).
37
23. m
Farm
Skeleton
§ Each
worker
is
executed
in
parallel
§ A
bit
like
a
1-‐stage
pipeline
{farm, Skel, M}
Skel1
Tn · · · T1
!
Skel2
.
.
.
Tn · · · T1
SkelM
skel:do([{farm, Skel, M}], Inputs).!
nc = { seq , fun ( X ) - X +1 end } ,
38
24. Using
The
Right
Pa:ern
Ma:ers
Speedup
Speedups for Matrix Multiplication
24
22
20
18
16
14
12
10
8
6
4
2
Naive Parallel
Farm
Farm with Chunk 16
12 4
8
12
16
No. cores
20
24
39
26. Refactoring
§ Refactoring
changes
the
structure
of
the
source
code
§ using
well-‐defined
rules
§ semi-‐automa:cally
under
programmer
guidance
Review
27. S1Refactoring:
Farm
Introduc2on
S2
⌘
P ipe(S1 , S2 )
pipe seq
Map(S1 S2 , d, r)
⌘
Map(S1 , d, r) Map(S2 , d, r)
map fission/fusion
S
⌘
F arm(S)
farm intro/elim
Map(F, d, r)
⌘
P ipe(Decomp(d)), F arm(F ), Recomp(r)) data2stream
0
S1
⌘
Map(S1 , d, r)
map intro/elim
Figure 3.3: Some Standard Skeleton Equivalences
Farm
The following describes each of the patterns in turn:
• a MAP is made up of three OPERATIONs: a worker, a partitioner, and a
combiner, followed by an INPUT;
• a SEQ is made up of a single OPERATION denoting the sequential computation to be performed, followed by an INPUT;
• a FARM is made up of a single OPERATON denoting the working, an INPUT
44
33. Large-‐Scale
Demonstrator
Applica2ons
§ ParaPhrase
tools
are
being
used
by
commercial/end-‐user
partners
§ SCCH
(SME,
Austria)
§ Erlang
Solu9ons
Ltd
(SME,
UK)
§ Mellanox
(Israel)
§ ELTESos,
Hungary
(SME)
§ AGH
(University,
Poland)
§ HLRS
(High
Performance
Compu9ng
Centre,
Germany)
34. Speedup
Results
(demonstrators)
Speedup
Speedups for Ant Colony, BasicN2 and Graphical Lasso
24
22
20
18
16
14
12
10
8
6
4
2
1
BasicN2
BasicN2 Manual
Graphical Lasso
Graphical Lasso Manual
Ant Colony Optimisation Manual
Ant Colony Optimisation
Speedup
close
to
or
beHer
than
manual
op9misa9on
1 2 4 6 8 10 12 14 16 18 20 22 24
No of Workers
55
35. Bow2e2:
most
widely
used
DNA
alignment
tool
28
30
26
Speedup
Speedup
24
22
20
25
20
18
16
15
Bt2FF-pin+int
Bt2
14
20
30
40
50
60
70
80
Read Length
90
100 110
Bt2FF-pin+int
Bt2
28
30
32
34
Quality
36
38
40
Original
Paraphrase
C.
Misale.
Accelera9ng
Bow9e2
with
a
lock-‐less
concurrency
approach
and
memory
affinity.
IEEE
PDP
2014.
To
appear.
56
36. Comparison
of
Development
Times
ge pipeline (k),
ates the images
the images (F ).
tained from the
e first farm and
o three workers
es, and one for
e load balancers
e, the nature of
second stage of
first stage takes
e takes around
n a substantial
Convolution
Ant Colony
BasicN2
Graphical Lasso
Man.Time
3 days
1 day
5 days
15 hours
Refac. Time
3 hours
1 hour
5 hours
2 hours
LOC Intro.
58
32
40
53
Figure 3.
Approximate manual implementation time of use-cases vs.
refactoring time with lines of code introduced by refactoring tool
linear scaling for higher numbers of cores, because of cache
synchronisation (disjunct but interleaving memory regions are
updated in the tasks), and an uneven size combined with a
limited number of tasks (48). At the end of the computation,
58
some cores will wait idly for the completion of remaining
38. Example:
Enumerate
Skeleton
Configura2ons
for
Image
Convolu2on
Δ(r p)
r || Δ(p)
Δ(r) p
r p
r || p
Δ(r) Δ(p)
r
:
read
image
file
p
:
process
image
file
r Δ(p)
39. Results
on
Benchmark:
Image
Convolu2on
MCTS
Mapping
(C,
G):
(6,
0)
||
(0,
3)
Speedup
39.12
Best
Speedup:
40.91
40. Conclusions
§ The
manycore
revolu9on
is
upon
us
§ Computer
hardware
is
changing
very
rapidly
(more
than
in
the
last
50
years)
§ The
megacore
era
is
here
(aka
exascale,
BIG
data)
§ Heterogeneity
and
energy
are
both
important
§ Most
programming
models
are
too
low-‐level
§ concurrency
based
§ need
to
expose
mass
parallelism
§ Paberns
and
func:onal
programming
help
with
abstrac9on
§ millions
of
threads,
easily
controlled
41. Conclusions
(2)
§ Func9onal
programming
makes
it
easy
to
introduce
parallelism
§ No
side
effects
means
any
computa9on
could
be
parallel
§ Matches
pabern-‐based
parallelism
§ Much
detail
can
be
abstracted
§ Lots
of
problems
can
be
avoided
§ e.g.
Freedom
from
Deadlock
§ Parallel
programs
give
the
same
results
as
sequen9al
ones!
§ Automa9on
is
very
important
§ Refactoring
drama9cally
reduces
development
9me
(while
keeping
the
programmer
in
the
loop)
§ Machine
learning
is
very
promising
for
determining
complex
performance
sewngs
42. But
isn’t
this
all
just
wishful
thinking?
Rampant-‐Lambda-‐Men
in
St
Andrews
66
43. NO!
§ C++11
has
lambda
func9ons
(and
some
other
nice
func9onal-‐
inspired
features)
§ Java
8
will
have
lambda
(closures)
§ Apple
uses
closures
in
Grand
Central
Dispatch
67
44. ParaPhrase
Parallel
C++
Refactoring
§ Integrated
into
Eclipse
§ Supports
full
C++(11)
standard
§ Uses
strongly
hygienic
components
§ func9onal
encapsula9on
(closures)
68
48. Funded
by
•
ParaPhrase
(EU
FP7),
Pa:erns
for
heterogeneous
mul2core,
€4.2M,
2011-‐2014
•
•
SCIEnce
(EU
FP6),
Grid/Cloud/Mul2core
coordina2on
• €3.2M,
2005-‐2012
Advance
(EU
FP7),
Mul2core
streaming
• €2.7M,
2010-‐2013
•
HPC-‐GAP
(EPSRC),
Legacy
system
on
thousands
of
cores
• £1.6M,
2010-‐2014
•
Islay
(EPSRC),
Real-‐2me
FPGA
streaming
implementa2on
• £1.4M,
2008-‐2011
•
TACLE:
European
Cost
Ac2on
on
Timing
Analysis
• €300K,
2012-‐2015
74
49. Some
of
our
Industrial
Connec2ons
Mellanox
Inc.
Erlang
Solu9ons
Ltd
SAP
GmbH,
Karlsrühe
BAe
Systems
Selex
Galileo
BioId
GmbH,
Stubgart
Philips
Healthcare
Sosware
Competence
Centre,
Hagenberg
Microsos
Research
Well-‐Typed
LLC
75
50. ParaPhrase
Needs
You!
•
Please
join
our
mailing
list
and
help
grow
our
user
community
§
§
§
§
§
§
•
news
items
access
to
free
development
sosware
chat
to
the
developers
free
developer
workshops
bug
tracking
and
fixing
Tools
for
both
Erlang
and
C++
Subscribe
at
hbps://mailman.cs.st-‐andrews.ac.uk/mailman/
lis9nfo/paraphrase-‐news
•
•
We’re
also
looking
for
open
source
developers...
We
also
have
8
PhD
studentships...
76
51. Further
Reading
Chris
Brown.
Vladimir
Janjic,
Kevin
Hammond,
Mehdi
Goli
and
John
McCall
“Bridging
the
Divide:
Intelligent
Mapping
for
the
Heterogeneous
Parallel
Programmer”,
Submi?ed
to
IPDPS
2014
Chris
Brown.
Marco
Danelu:o,
Kevin
Hammond,
Peter
Kilpatrick
and
Sam
Elliot
“Cost-‐Directed
Refactoring
for
Parallel
Erlang
Programs”
To
appear
in
InternaGonal
Journal
of
Parallel
Programming,
2013
Vladimir
Janjic,
Chris
Brown.
Max
Neunhoffer,
Kevin
Hammond,
Steve
Linton
and
Hans-‐
Wolfgang
Loidl
“Space
Explora2on
using
Parallel
Orbits”
Proc.
PARCO
2013:
Interna2onal
Conf.
on
Parallel
Compu2ng,
Munich,
Sept.
2013
Ask
me
for
copies!
Chris
Brown.
Hans-‐Wolfgang
Loidl
and
Kevin
Hammond
Many
technical
“ParaForming
Forming
Parallel
Haskell
Programs
using
efactoring
Techniques”
results
011
Trends
he
uncGonal
Programming
(TFP),
MNovel
Rpain,
May
2011
also
on
t in
F
Proc.
2
adrid,
S
project
web
site:
Henrique
ownload!
free
for
dFerreiro,
David
Castro,
Vladimir
Janjic
and
Kevin
Hammond
“Repea2ng
History:
Execu2on
Replay
for
Parallel
Haskell
Programs”
Proc.
2012
Trends
in
FuncGonal
Programming
(TFP),
St
Andrews,
UK,
June
2012