In this lecture with live code modification components, we showcase distributed deep learning on an Intel® Xeon Phi™ processor cluster with Intel® Omni-Path Architecture. It targets developers of all skill levels, and is designed to give a brief but hands-on introduction to the machine learning frameworks with Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) enhancements.
Start with a brief introduction to machine learning frameworks that are optimized with the new Intel® MKL-DNN. Develop a simple deep learning image recognition application using the framework. Observe how the computational performance of this application scales while adding compute nodes.
Hands-On Workshop on Performance Optimization for Intel Xeon Phi Processor Family x200 (formerly Knights Landing) from Colfax International. More information at http://colfaxresearch.com/knl-webinar/
GET READY FOR INTEL'S KNIGHTS LANDING
As the leading provider of code modernization and optimization training, Colfax now offers a 1-hour webinar: “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing”.
ANOTHER LEAP IN PARALLEL PERFORMANCE
Next-generation Intel Xeon Phi processors codenamed Knights Landing (KNL) are expected to provide up to 3X higher performance than the current generation. With on-board high-bandwidth memory and optional integrated high-speed fabric—plus the availability of socket form-factor —these powerful components will transform the fundamental building block of technical computing.
The transformation of scalable manycore to a processor is going to be a remarkable in the parallel computing field and we are offering help to developers worldwide on getting the best out of the new processor. The webinar will help get you up to speed with:
- Knights Landing architecture
- New KNL features
- Code transition and modernization strategy
MULTIPLE RUNS AND FLEXIBLE TIMINGS FOR ALL GEOS
The webinar will air multiple times at different time slots making it convenient for developers worldwide to attend the webinar.
REGISTER TODAY
at http://colfaxresearch.com/knl-webinar/
A Library for Emerging High-Performance Computing ClustersIntel® Software
Deployed next-generation architectures and systems are characterized by high concurrency, low memory per core, and multilevels of hierarchy and heterogeneity. These characteristics bring out new challenges in energy efficiency, fault-tolerance, and scalability. Next-generation programming models and their associated middleware and runtimes have a responsibility to tackle these challenges.
This talk focuses on challenges and opportunities in designing efficient runtimes using a formula (MPI+X) to accelerate applications for emerging high-performance computing (HPC) systems with millions of processors and featuring next-generation interconnects. Energy-aware designs and codesign schemes for such environments are also emphasized. View features and sample performance numbers from the MVAPICH2 libraries.
DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. TAU is a versatile
profiling and tracing toolkit that supports performance engineering of
codes written for CPU and GPUs and has support for most IBM platforms.
This talk will give an overview of TAU and E4S and how developers can
use these tools to analyze the performance of their codes. TAU supports
transparent instrumentation of codes without modifying the application
binary. The talk will describe TAU's support for CUDA, OpenACC, pthread,
OpenMP, Kokkos, and MPI applications. It will describe TAU's use for
Python based frameworks such as Tensorflow and PyTorch. It will cover
the use of TAU in E4S containers using Docker and Singularity runtimes
under ppc64le. E4S provides both source builds through the Spack
platform and a set of containers that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users.
Introduction to the OpenPOWER Foundation - Open Source Days eventMandie Quartly
An introduction to the OpenPOWER Foundation for the Open Source Days event in Copenhagen in Feb 2016. Why it was created and how the open ecosystem based on the POWER architecture is driving innovation.
A more detailed overview can be found here -> ibm.biz/openpower_overview
Join the rebellion :)
Hands-On Workshop on Performance Optimization for Intel Xeon Phi Processor Family x200 (formerly Knights Landing) from Colfax International. More information at http://colfaxresearch.com/knl-webinar/
GET READY FOR INTEL'S KNIGHTS LANDING
As the leading provider of code modernization and optimization training, Colfax now offers a 1-hour webinar: “Introduction to Next-Generation Intel® Xeon Phi™ Processor: Developer’s Guide to Knights Landing”.
ANOTHER LEAP IN PARALLEL PERFORMANCE
Next-generation Intel Xeon Phi processors codenamed Knights Landing (KNL) are expected to provide up to 3X higher performance than the current generation. With on-board high-bandwidth memory and optional integrated high-speed fabric—plus the availability of socket form-factor —these powerful components will transform the fundamental building block of technical computing.
The transformation of scalable manycore to a processor is going to be a remarkable in the parallel computing field and we are offering help to developers worldwide on getting the best out of the new processor. The webinar will help get you up to speed with:
- Knights Landing architecture
- New KNL features
- Code transition and modernization strategy
MULTIPLE RUNS AND FLEXIBLE TIMINGS FOR ALL GEOS
The webinar will air multiple times at different time slots making it convenient for developers worldwide to attend the webinar.
REGISTER TODAY
at http://colfaxresearch.com/knl-webinar/
A Library for Emerging High-Performance Computing ClustersIntel® Software
Deployed next-generation architectures and systems are characterized by high concurrency, low memory per core, and multilevels of hierarchy and heterogeneity. These characteristics bring out new challenges in energy efficiency, fault-tolerance, and scalability. Next-generation programming models and their associated middleware and runtimes have a responsibility to tackle these challenges.
This talk focuses on challenges and opportunities in designing efficient runtimes using a formula (MPI+X) to accelerate applications for emerging high-performance computing (HPC) systems with millions of processors and featuring next-generation interconnects. Energy-aware designs and codesign schemes for such environments are also emphasized. View features and sample performance numbers from the MVAPICH2 libraries.
DOE Exascale Computing Project (EC) Software Technology focus area
is developing an HPC software ecosystem that will enable the efficient
and performant execution of exascale applications. Through the
Extreme-scale Scientific Software Stack (E4S), it is developing a
comprehensive and coherent software stack that will enable application
developers to productively write highly parallel applications that can
portably target diverse exascale architectures - including the IBM
OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of
HPC software packages including the TAU Performance System(R) for
performance evaluation of HPC and AI/ML codes. TAU is a versatile
profiling and tracing toolkit that supports performance engineering of
codes written for CPU and GPUs and has support for most IBM platforms.
This talk will give an overview of TAU and E4S and how developers can
use these tools to analyze the performance of their codes. TAU supports
transparent instrumentation of codes without modifying the application
binary. The talk will describe TAU's support for CUDA, OpenACC, pthread,
OpenMP, Kokkos, and MPI applications. It will describe TAU's use for
Python based frameworks such as Tensorflow and PyTorch. It will cover
the use of TAU in E4S containers using Docker and Singularity runtimes
under ppc64le. E4S provides both source builds through the Spack
platform and a set of containers that feature a broad collection of HPC
software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users.
Introduction to the OpenPOWER Foundation - Open Source Days eventMandie Quartly
An introduction to the OpenPOWER Foundation for the Open Source Days event in Copenhagen in Feb 2016. Why it was created and how the open ecosystem based on the POWER architecture is driving innovation.
A more detailed overview can be found here -> ibm.biz/openpower_overview
Join the rebellion :)
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Chris Fregly
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
Bio:
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
http://pipeline.ai
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based on open source tools and completely reproducible through Docker on your own GPU cluster.
Bio
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High-Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
This is the material of a Burst Buffer training that was presented for the early users program. It covers introduction about parallel I/O, Lustre, Darshan tool, Burst Buffer, and optimization parameters for MPI I/O.
Full video: https://youtu.be/8zLcZmiTweg
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
Despite Moore's "law", uniprocessor clock speeds have now stalled. Rather than single processors running at ever higher clock speeds, it is
common to find dual-, quad- or even hexa-core processors, even in consumer laptops and desktops.
Future hardware will not be slightly parallel, however, as in today's multicore systems, but will be
massively parallel, with manycore and perhaps even megacore systems
becoming mainstream.
This means that programmers need to start thinking parallel. To achieve this they must move away
from traditional programming models where parallelism is a
bolted-on afterthought. Rather, programmers must use languages where parallelism is deeply embedded into the programming model
from the outset.
By providing a high level model of computation, without explicit ordering of computations,
declarative languages in general, and functional languages in particular, offer many advantages for parallel
programming.
One of the most fundamental advantages of the functional paradigm is purity.
In a purely functional language, as exemplified by Haskell, there are simply no side effects: it is therefore impossible for parallel computations to conflict with each
other in ways that are not well understood.
ParaForming aims to radically improve the process
of parallelising purely functional programs through a comprehensive set of high-level parallel refactoring patterns for Parallel Haskell,
supported by advanced refactoring tools.
By matching parallel design patterns with appropriate algorithmic skeletons
using advanced software refactoring techniques and novel cost information, we will bridge the gap between fully automatic
and fully explicit approaches to parallelisation, helping programmers "think parallel" in a systematic,
guided way. This talk introduces the ParaForming approach, gives some examples and shows how
effective parallel programs can be developed using advanced refactoring technology.
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is projected to be millions of dollars per annum.
To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost reduction opportunities, and the discovery process for cross-functional teams.
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
I'm going to discuss the efficiency/performance optimizations of different layers of the system. Starting from the lowest levels like hardware and drivers: these tunings can be applied to pretty much any high-load server. Then we’ll move to Linux kernel and its TCP/IP stack: these are the knobs you want to try on any of your TCP-heavy boxes. Finally, we’ll discuss library and application-level tunings, which are mostly applicable to HTTP servers in general and nginx/envoy specifically.
For each potential area of optimization I’ll try to give some background on latency/throughput tradeoffs (if any), monitoring guidelines, and, finally, suggest tunings for different workloads.
Also, I'll cover more theoretical approaches to performance analysis and the newly developed tooling like `bpftrace` and new `perf` features.
This presentation was given to the Dublin Node (JS) Community on May 29th 2014.
Presented by: Chris Lawless, Kevin Yu Wei Xia, Fergal Carroll @phergalkarl, Ciarán Ó hUallacháin, and Aman Kohli @akohli
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
Modern Web Security, Lazy but Mindful Like a FoxC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2hYU0cd.
Albert Yu presents a few viable, usable and effective defensive techniques that developers have often overlooked. Filmed at qconsf.com.
Albert Yu is currently working as a principal engineer for the Trust Engineering team in Atlassian. He has spent 15 years exposing himself to many different aspects of a security program, including security engineering, R&D, product reviews, code review, penetration test, governance and compliance, risk management, incident response, in large scale environment.
Overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of-core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.
CIS 512 discussion post responses.CPUs and Programming Pleas.docxmccormicknadine86
CIS 512 discussion post responses.
"CPUs and Programming" Please respond to the following:
· From the first e-Activity, identify the following CPUs: 1) the CPU that resides on a computer that you own or a computer that you would consider purchasing, and 2) the CPU of one (1) other computer. Compare the instruction sets and clock rates of each CPU. Determine which CPU of the two is faster and why. Conclude whether or not the clock rate by itself makes the CPU faster. Provide a rationale for your response.
· From the second e-Activity, examine two (2) benefits of using planning techniques—such as writing program flowcharts, pseudocode, or other available programming planning technique—to devise and design computer programs. Evaluate the effectiveness of your preferred program planning technique, based on its success in the real world. Provide one (1) example of a real-life application of your preferred program planning technique to support your response.
MB’s post states the following:Top of Form
"CPUs and Programming" Please respond to the following:
· From the first e-Activity, identify the following CPUs: 1) the CPU that resides on a computer that you own or a computer that you would consider purchasing, and 2) the CPU of one (1) other computer. Compare the instruction sets and clock rates of each CPU. Determine which CPU of the two is faster and why. Conclude whether or not the clock rate by itself makes the CPU faster. Provide a rationale for your response.
The current CPU that resides in my computer is an Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz, 2MB cache, 4 cores, and 8 threads
In comparison with my CPU to an Intel® Core™ i9-10980XE Extreme Edition Processor (24.75MB Cache, Up to 4.60 GHz), 24.75 MB Cache, 18 Cores, 36 Threads, 4.60GHz Max Turbo Frequency, XE – Extreme performance and mega-tasking, unlocked, I found the Intel i9 CPU clock speed is almost twice the speed of the CPU in my computer with a speed of 4.60GHz and cache up to 24.75 MB. This means the Intel core i9 stores more accessible data up to 24.75 MB of data on the CPU for processing and availability which is a huge amount of space for storing data directly on a CPU. The 18 cores also allows for multiple simultaneous parallel data processing through (Fetch, Decode and Execute) which increase the speed of processing data.
· From the second e-Activity, examine two (2) benefits of using planning techniques—such as writing program flowcharts, pseudocode, or other available programming planning technique—to devise and design computer programs. Evaluate the effectiveness of your preferred program planning technique, based on its success in the real world. Provide one (1) example of a real-life application of your preferred program planning technique to support your response.
Two benefits of using planning techniques to design a computer program are, to establish a frame work Architecture to provide a road map for implementation. Another benefit for planning is to provide c ...
We build AI and HPC solutions. Expertise: highly optimized AI Engines and HPC Apps.
• HPC: accelerating time to results and adapting complex algorithms to GPU, FPGA, many-CPU architectures.
Leverage byteLAKE expertise in complex algorithms adaptation and optimization for NVIDIA GPUs, Xilinx Alveo FPGAs, Intel, AMD and ARM solutions. From single nodes to clusters.
More: www.byteLAKE.com/en/Alveo
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
More Related Content
Similar to Intel colfax optimizing-machine-learning-workloads
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...Chris Fregly
Applying my Netflix experience to a real-world problem in the ML and AI world, I will demonstrate a full-featured, open-source, end-to-end TensorFlow Model Training and Deployment System using the latest advancements from Kubernetes, Istio, and TensorFlow.
In addition to training and hyper-parameter tuning, our model deployment pipeline will include continuous canary deployments of our TensorFlow Models into a live, hybrid-cloud production environment.
This is the holy grail of data science - rapid and safe experiments of ML / AI models directly in production.
Following the Successful Netflix Culture that I lived and breathed (https://www.slideshare.net/reed2001/culture-1798664/2-Netflix_CultureFreedom_Responsibility2), I give Data Scientists the Freedom and Responsibility to extend their ML / AI pipelines and experiments safely into production.
Offline, batch training and validation is for the slow and weak. Online, real-time training and validation on live production data is for the fast and strong.
Learn to be fast and strong by attending this talk.
Bio:
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Chris Fregly
http://pipeline.ai
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based on open source tools and completely reproducible through Docker on your own GPU cluster.
Bio
Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High-Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
http://pipeline.ai
This is the material of a Burst Buffer training that was presented for the early users program. It covers introduction about parallel I/O, Lustre, Darshan tool, Burst Buffer, and optimization parameters for MPI I/O.
Full video: https://youtu.be/8zLcZmiTweg
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
Despite Moore's "law", uniprocessor clock speeds have now stalled. Rather than single processors running at ever higher clock speeds, it is
common to find dual-, quad- or even hexa-core processors, even in consumer laptops and desktops.
Future hardware will not be slightly parallel, however, as in today's multicore systems, but will be
massively parallel, with manycore and perhaps even megacore systems
becoming mainstream.
This means that programmers need to start thinking parallel. To achieve this they must move away
from traditional programming models where parallelism is a
bolted-on afterthought. Rather, programmers must use languages where parallelism is deeply embedded into the programming model
from the outset.
By providing a high level model of computation, without explicit ordering of computations,
declarative languages in general, and functional languages in particular, offer many advantages for parallel
programming.
One of the most fundamental advantages of the functional paradigm is purity.
In a purely functional language, as exemplified by Haskell, there are simply no side effects: it is therefore impossible for parallel computations to conflict with each
other in ways that are not well understood.
ParaForming aims to radically improve the process
of parallelising purely functional programs through a comprehensive set of high-level parallel refactoring patterns for Parallel Haskell,
supported by advanced refactoring tools.
By matching parallel design patterns with appropriate algorithmic skeletons
using advanced software refactoring techniques and novel cost information, we will bridge the gap between fully automatic
and fully explicit approaches to parallelisation, helping programmers "think parallel" in a systematic,
guided way. This talk introduces the ParaForming approach, gives some examples and shows how
effective parallel programs can be developed using advanced refactoring technology.
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
Equipment maintenance log of the global fleet is traditionally maintained using legacy infrastructure and data models, which limit the ability to extract insights at scale. However, to impact the bottom line, it is critical to ingest and enrich global fleet data to generate data driven guidance for operations. The impact of such insights is projected to be millions of dollars per annum.
To this end, we leverage Databricks to perform machine learning at scale, including ingesting (structured and unstructured data) from legacy systems, and then sifting through millions of nonlinearly growing records to extract insights using NLP. The insights enable outlier identification, capacity planning, prioritization of cost reduction opportunities, and the discovery process for cross-functional teams.
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
I'm going to discuss the efficiency/performance optimizations of different layers of the system. Starting from the lowest levels like hardware and drivers: these tunings can be applied to pretty much any high-load server. Then we’ll move to Linux kernel and its TCP/IP stack: these are the knobs you want to try on any of your TCP-heavy boxes. Finally, we’ll discuss library and application-level tunings, which are mostly applicable to HTTP servers in general and nginx/envoy specifically.
For each potential area of optimization I’ll try to give some background on latency/throughput tradeoffs (if any), monitoring guidelines, and, finally, suggest tunings for different workloads.
Also, I'll cover more theoretical approaches to performance analysis and the newly developed tooling like `bpftrace` and new `perf` features.
This presentation was given to the Dublin Node (JS) Community on May 29th 2014.
Presented by: Chris Lawless, Kevin Yu Wei Xia, Fergal Carroll @phergalkarl, Ciarán Ó hUallacháin, and Aman Kohli @akohli
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
Modern Web Security, Lazy but Mindful Like a FoxC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2hYU0cd.
Albert Yu presents a few viable, usable and effective defensive techniques that developers have often overlooked. Filmed at qconsf.com.
Albert Yu is currently working as a principal engineer for the Trust Engineering team in Atlassian. He has spent 15 years exposing himself to many different aspects of a security program, including security engineering, R&D, product reviews, code review, penetration test, governance and compliance, risk management, incident response, in large scale environment.
Overview of challenges being faced by the AI community to achieve high-performance, scalable and distributed DNN training on Modern HPC systems with both scale-up and scale-out strategies. After that, the talk will focus on a range of solutions being carried out in my group to address these challenges. The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of-core DNN training, and 4) Hybrid (Data and Model) parallelism. Case studies to accelerate DNN training with popular frameworks like TensorFlow, PyTorch, MXNet and Caffe on modern HPC systems will be presented.
CIS 512 discussion post responses.CPUs and Programming Pleas.docxmccormicknadine86
CIS 512 discussion post responses.
"CPUs and Programming" Please respond to the following:
· From the first e-Activity, identify the following CPUs: 1) the CPU that resides on a computer that you own or a computer that you would consider purchasing, and 2) the CPU of one (1) other computer. Compare the instruction sets and clock rates of each CPU. Determine which CPU of the two is faster and why. Conclude whether or not the clock rate by itself makes the CPU faster. Provide a rationale for your response.
· From the second e-Activity, examine two (2) benefits of using planning techniques—such as writing program flowcharts, pseudocode, or other available programming planning technique—to devise and design computer programs. Evaluate the effectiveness of your preferred program planning technique, based on its success in the real world. Provide one (1) example of a real-life application of your preferred program planning technique to support your response.
MB’s post states the following:Top of Form
"CPUs and Programming" Please respond to the following:
· From the first e-Activity, identify the following CPUs: 1) the CPU that resides on a computer that you own or a computer that you would consider purchasing, and 2) the CPU of one (1) other computer. Compare the instruction sets and clock rates of each CPU. Determine which CPU of the two is faster and why. Conclude whether or not the clock rate by itself makes the CPU faster. Provide a rationale for your response.
The current CPU that resides in my computer is an Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz, 2MB cache, 4 cores, and 8 threads
In comparison with my CPU to an Intel® Core™ i9-10980XE Extreme Edition Processor (24.75MB Cache, Up to 4.60 GHz), 24.75 MB Cache, 18 Cores, 36 Threads, 4.60GHz Max Turbo Frequency, XE – Extreme performance and mega-tasking, unlocked, I found the Intel i9 CPU clock speed is almost twice the speed of the CPU in my computer with a speed of 4.60GHz and cache up to 24.75 MB. This means the Intel core i9 stores more accessible data up to 24.75 MB of data on the CPU for processing and availability which is a huge amount of space for storing data directly on a CPU. The 18 cores also allows for multiple simultaneous parallel data processing through (Fetch, Decode and Execute) which increase the speed of processing data.
· From the second e-Activity, examine two (2) benefits of using planning techniques—such as writing program flowcharts, pseudocode, or other available programming planning technique—to devise and design computer programs. Evaluate the effectiveness of your preferred program planning technique, based on its success in the real world. Provide one (1) example of a real-life application of your preferred program planning technique to support your response.
Two benefits of using planning techniques to design a computer program are, to establish a frame work Architecture to provide a road map for implementation. Another benefit for planning is to provide c ...
We build AI and HPC solutions. Expertise: highly optimized AI Engines and HPC Apps.
• HPC: accelerating time to results and adapting complex algorithms to GPU, FPGA, many-CPU architectures.
Leverage byteLAKE expertise in complex algorithms adaptation and optimization for NVIDIA GPUs, Xilinx Alveo FPGAs, Intel, AMD and ARM solutions. From single nodes to clusters.
More: www.byteLAKE.com/en/Alveo
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/