This document summarizes the capabilities and performance of the Tesla K40 GPU accelerator. It notes that the K40 provides 1.4 TFlops of computing power with 2880 cores and 288 GB/s of memory bandwidth. Benchmark results are shown for applications in fluid dynamics, rendering, and seismic analysis, demonstrating speedups of 4-6x over CPUs. Additional application speedups of 1.66-10.29x are shown for structural mechanics, physics, molecular dynamics, and material science applications compared to CPUs or earlier GPU models. The document highlights how the K40's GPU Boost feature can provide up to 40% higher performance over the earlier K20X by opportunistically increasing clock speeds.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix's use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
10 tips to improve the performance of your AWS applicationAmazon Web Services
As users of the AWS platform it is important that we don't re-invent the wheel and we eliminate the undifferentiated heavy lifting of IT to free up scarce engineering resources that can focus on truly adding value to business-related activities. In this technical session an AWS Solution Architect will take you through a few tip and trick gems, potentially something you didn't know existed, allowing you to more efficiently and securely deploy, utilise and manage the vast array of Amazon Web Services to support your business requirements.
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
The ACM SIGPLAN 6th Annual Chapel Implementers and Users Workshop (CHIUW2019) co-located with PLDI 2019 / ACM FCRC 2019.
PGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support accelerators, especially GPU accelerators, in heterogeneous nodes in a cluster. Among high-level PGAS programming languages, Chapel is well suited for this task due to its use of locales and domains to help abstract away low-level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. In this paper, we address some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. First, we introduce a Chapel module, GPUIterator, which is a portable programming interface that supports GPU execution of a Chapel forall loop. This module makes it possible for Chapel programmers to easily use hand-tuned native GPU programs/libraries, which is an important requirement in practice since there is still a big performance gap between compiler-generated GPU code and hand-turned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Second, though Chapel programs are regularly executed on multi-node clusters, past work on GPU enablement of Chapel programs mainly focused on single-node execution. In contrast, our work supports execution across multiple CPU+GPU nodes by accepting Chapel's distributed domains. Third, our approach supports hybrid execution of a Chapel parallel (forall) loop across both a GPU and CPU cores, which is beneficial for specific platforms. Our preliminary performance evaluations show that the use of the GPUIterator is a promising approach for Chapel programmers to easily utilize a single or multiple CPU+GPU node(s) while maintaining portability.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning applications to get the performance that you need can be challenging. When you have to tune a number of microservices in Kubernetes to fix a response time or a throughput issue, it can get really overwhelming. This talk looks at some common performance issues and ways to solve them and more importantly the tools that can help you. We will also be specifically looking at Kruize that helps to not only right size your containers but also optimize the runtimes.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix's use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
10 tips to improve the performance of your AWS applicationAmazon Web Services
As users of the AWS platform it is important that we don't re-invent the wheel and we eliminate the undifferentiated heavy lifting of IT to free up scarce engineering resources that can focus on truly adding value to business-related activities. In this technical session an AWS Solution Architect will take you through a few tip and trick gems, potentially something you didn't know existed, allowing you to more efficiently and securely deploy, utilise and manage the vast array of Amazon Web Services to support your business requirements.
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
The ACM SIGPLAN 6th Annual Chapel Implementers and Users Workshop (CHIUW2019) co-located with PLDI 2019 / ACM FCRC 2019.
PGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support accelerators, especially GPU accelerators, in heterogeneous nodes in a cluster. Among high-level PGAS programming languages, Chapel is well suited for this task due to its use of locales and domains to help abstract away low-level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. In this paper, we address some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. First, we introduce a Chapel module, GPUIterator, which is a portable programming interface that supports GPU execution of a Chapel forall loop. This module makes it possible for Chapel programmers to easily use hand-tuned native GPU programs/libraries, which is an important requirement in practice since there is still a big performance gap between compiler-generated GPU code and hand-turned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Second, though Chapel programs are regularly executed on multi-node clusters, past work on GPU enablement of Chapel programs mainly focused on single-node execution. In contrast, our work supports execution across multiple CPU+GPU nodes by accepting Chapel's distributed domains. Third, our approach supports hybrid execution of a Chapel parallel (forall) loop across both a GPU and CPU cores, which is beneficial for specific platforms. Our preliminary performance evaluations show that the use of the GPUIterator is a promising approach for Chapel programmers to easily utilize a single or multiple CPU+GPU node(s) while maintaining portability.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning applications to get the performance that you need can be challenging. When you have to tune a number of microservices in Kubernetes to fix a response time or a throughput issue, it can get really overwhelming. This talk looks at some common performance issues and ways to solve them and more importantly the tools that can help you. We will also be specifically looking at Kruize that helps to not only right size your containers but also optimize the runtimes.
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...DataStax
The Anti-Entropy process used by nodetool repair is the way of ensuring consistency of data on disk. Over the many years of the Apache Cassandra project it has also been the biggest pain point for teams running Cassandra. With a solid repair process in place you can be confident that deleted data will not come back to life, and that data is fully distributed when nodes fail.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will explain how Anti-Entropy works and why it should be run on your cluster. He will discuss the different options such as ""primary range"" repair, sub-range repairs, and incremental repair introduced in version 2.1.
He will also introduce additional tools such as the Spotify Reaper and the range repair script, and future optimisations incremental repair could bring to the read path.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
This presentation is from the Gophercon-India where we talked about how to design a concurrent high performance database client in go language. We talked about how we use goroutines and channels to our advantages. we also talked about how to use pools for efficient memory utilization.
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014Amazon Web Services
"How can you reliably schedule tasks in an unreliable, autoscaling cloud environment? This presentation talks about the design of our Fenzo scheduler, built on Apache Mesos, that serves as the core of our stream-processing platform, Mantis, designed for real-time insights. We focus on the following aspects of the scheduler:
- Resource granularity
- Fault tolerance
- Bin packing, task affinity, stream locality
- Autoscaling of the cluster and of individual service jobs
- Constraints (hard and soft) for individual tasks such as zone balancing, unique, and exclusive instances
This talk also includes detailed information on a holistic approach to scheduling in a distributed, autoscaling environment to achieve both speed and advanced scheduling optimizations."
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltStack
Google is making the power of its datacenter, network, and technology innovations available to the world through its Cloud services. This presentation will provide an overview of the Google Cloud Platform and a deeper dive on Google Compute Engine. Google recently made an open source contribution to SaltStack and now you can now use Salt Cloud to manage your Compute Engine resources (IaaS virtual machine services). Come find out more about Google's Cloud Platform and how you can leverage Google scale with SaltStack.
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?Julia Angell
In the world of Big Data, scaling out is the norm. However, many Big Data deployments are trapped in a sea of small box clusters.
With the advent of scalable platforms like Scylla, node performance is no longer an issue and doubling the size of the nodes can double the available storage, memory, and processing power. So what stops people from going big in the Cloud Native world?
Watch this webinar to learn the pros and cons of large nodes, and explore why people resist using big machines, including:
- Is the cost of recovering from failures higher in larger nodes?
- Does performance increase linearly as machines get bigger?
- Does cluster performance suffer for the entire time of recovery from failures?
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
Speaker: Lukas Fittl
Your PostgreSQL database is one of the most important pieces of your architecture - yet the level of introspection available in Postgres is often hard to work with. Its easy to get very detailed information, but what should you really watch out for, send reports on and alert on?
In this talk we'll discuss how query performance statistics can be made accessible to application developers, critical entries one should monitor in the PostgreSQL log files, how to collect EXPLAIN plans at scale, how to watch over autovacuum and VACUUM operations, and how to flag issues based on schema statistics.
We'll also talk a bit about monitoring multi-server setups, first going into high availability and read standbys, logical replication, and then reviewing how monitoring looks like for sharded databases like Citus.
The talk will primarily describe free/open-source tools and statistics views readily available from within Postgres.
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
Distributed training is a complex process that does more harm than good if it not setup correctly.
https://www.bigdataspain.org/2017/talk/apache-mxnet-distributed-training-explained-in-depth
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...DataStax
The Anti-Entropy process used by nodetool repair is the way of ensuring consistency of data on disk. Over the many years of the Apache Cassandra project it has also been the biggest pain point for teams running Cassandra. With a solid repair process in place you can be confident that deleted data will not come back to life, and that data is fully distributed when nodes fail.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will explain how Anti-Entropy works and why it should be run on your cluster. He will discuss the different options such as ""primary range"" repair, sub-range repairs, and incremental repair introduced in version 2.1.
He will also introduce additional tools such as the Spotify Reaper and the range repair script, and future optimisations incremental repair could bring to the read path.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
This presentation is from the Gophercon-India where we talked about how to design a concurrent high performance database client in go language. We talked about how we use goroutines and channels to our advantages. we also talked about how to use pools for efficient memory utilization.
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014Amazon Web Services
"How can you reliably schedule tasks in an unreliable, autoscaling cloud environment? This presentation talks about the design of our Fenzo scheduler, built on Apache Mesos, that serves as the core of our stream-processing platform, Mantis, designed for real-time insights. We focus on the following aspects of the scheduler:
- Resource granularity
- Fault tolerance
- Bin packing, task affinity, stream locality
- Autoscaling of the cluster and of individual service jobs
- Constraints (hard and soft) for individual tasks such as zone balancing, unique, and exclusive instances
This talk also includes detailed information on a holistic approach to scheduling in a distributed, autoscaling environment to achieve both speed and advanced scheduling optimizations."
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltStack
Google is making the power of its datacenter, network, and technology innovations available to the world through its Cloud services. This presentation will provide an overview of the Google Cloud Platform and a deeper dive on Google Compute Engine. Google recently made an open source contribution to SaltStack and now you can now use Salt Cloud to manage your Compute Engine resources (IaaS virtual machine services). Come find out more about Google's Cloud Platform and how you can leverage Google scale with SaltStack.
Webinar: Does it Still Make Sense to do Big Data with Small Nodes?Julia Angell
In the world of Big Data, scaling out is the norm. However, many Big Data deployments are trapped in a sea of small box clusters.
With the advent of scalable platforms like Scylla, node performance is no longer an issue and doubling the size of the nodes can double the available storage, memory, and processing power. So what stops people from going big in the Cloud Native world?
Watch this webinar to learn the pros and cons of large nodes, and explore why people resist using big machines, including:
- Is the cost of recovering from failures higher in larger nodes?
- Does performance increase linearly as machines get bigger?
- Does cluster performance suffer for the entire time of recovery from failures?
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
Speaker: Lukas Fittl
Your PostgreSQL database is one of the most important pieces of your architecture - yet the level of introspection available in Postgres is often hard to work with. Its easy to get very detailed information, but what should you really watch out for, send reports on and alert on?
In this talk we'll discuss how query performance statistics can be made accessible to application developers, critical entries one should monitor in the PostgreSQL log files, how to collect EXPLAIN plans at scale, how to watch over autovacuum and VACUUM operations, and how to flag issues based on schema statistics.
We'll also talk a bit about monitoring multi-server setups, first going into high availability and read standbys, logical replication, and then reviewing how monitoring looks like for sharded databases like Citus.
The talk will primarily describe free/open-source tools and statistics views readily available from within Postgres.
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
Distributed training is a complex process that does more harm than good if it not setup correctly.
https://www.bigdataspain.org/2017/talk/apache-mxnet-distributed-training-explained-in-depth
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
Axel Koehler from Nvidia presented this deck at the 2016 HPC Advisory Council Switzerland Conference.
“Accelerated computing is transforming the data center that delivers unprecedented through- put, enabling new discoveries and services for end users. This talk will give an overview about the NVIDIA Tesla accelerated computing platform including the latest developments in hardware and software. In addition it will be shown how deep learning on GPUs is changing how we use computers to understand data.”
In related news, the GPU Technology Conference takes place April 4-7 in Silicon Valley.
Watch the video presentation: http://insidehpc.com/2016/03/tesla-accelerated-computing/
See more talks in the Swiss Conference Video Gallery:
http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter:
http://insidehpc.com/newsletter
POLYTEDA LLC a provider of semiconductor design software and PV-services, announced the general availability of PowerDRC/LVS version 2.0.1. This release is dedicated to delivering further significant improvements for multi-CPU mode and some new LVS functionality. From now XOR operation supports multi-CPU mode to dramatically increase performance
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
Jack Dongarra from the University of Tennessee presented these slides at Ken Kennedy Institute of Information Technology on Feb 13, 2014.
Listen to the podcast review of this talk: http://insidehpc.com/2014/02/13/week-hpc-jack-dongarra-talks-algorithms-exascale/
Supercomputing has swept rapidly from the far edges of science to the heart of our everyday lives. And propelling it forward – bringing it into the mobile phone already in your pocket and the car in your driveway – is GPU acceleration, NVIDIA CEO Jen-Hsun Huang told a packed house at a rollicking event kicking off this week’s SC15 annual supercomputing show in Austin. The event draws 10,000 researchers, national lab directors and others from around the world.
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univainside-BigData.com
In this deck from the Univa Breakfast Briefing at ISC 2018, Duncan Poole from NVIDIA describes how the company is accelerating HPC in the Cloud.
Learn more: https://www.nvidia.com/en-us/data-center/dgx-systems/
and
http://univa.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Today’s groundbreaking scientific discoveries are taking place in HPC data centers. Using containers, researchers and scientists gain the flexibility to run HPC application containers on NVIDIA Volta-powered systems including Quadro-powered workstations, NVIDIA DGX Systems, and HPC clusters.
The Economics of Scaling Cassandra - By Alex Bordei, Techie Product Manager at Bigstep
This presentation was made during the "Cassandra Summit 2014" Event, in London.
We benchmarked Cassandra on a number of configurations and we show what's the scaling profile. We test Cassandra on Docker as well as Cassandra's In-memory feature.
Follow Alex on Twitter: @alexandrubordei
Bigstep on Twitter: @BigStepInc
If you have any questions, let us know at hello@bigstep.com and we'll do our best to answer.
Stay informed: http://blog.bigstep.com/
Mobile data traffic has quadrupled since 2013. In order to cope with a newly diversified device landscape, engineers have embraced responsive design. Implementing “responsive images” is the most important thing that you can do for a responsive site’s performance.
In this session, we discuss the past, present, and future of responsive images.
For image optimization, reducing the quality doesn’t always lead to degradation of visual experience. In fact, precise adjustment of compression level and fine tuning of encoding settings can reduce significantly the file size without any noticeable degradation. But, there is no standard quality setting that works for all images - it depends on the compression algorithm, image format and content. And manually experimentation is not scalable.
In this webinar we cover how to find the best quality compression level and optimal encoding settings, in order to produce a perceptually fine image while minimizing the file size.
B2B Product Marketing. What is the role of Product Marketing in organizations? What are the most important skills to be a good product marketing manager?
Listen to Professor Ross Walker and Adrian Roitberg explaining the new GPU features of AMBER version 14. With these features, AMBER is now world's fastest Molecular Dynamics package.
Challenges and Advances in Large-scale DFT Calculations on GPUs using TeraChemCan Ozdoruk
Recent advances in reformulating electronic structure algorithms for stream processors such as graphical processing units have made DFT calculations on systems comprising up to O(10 to the 3) atoms feasible. Simulations on such systems that previously required half a week on traditional processors can now be completed in only half an hour. Listen to Professor Heather Kulik, Massachusetts Institute of Technology, as she discusses how she leverages these GPU-accelerated quantum chemistry methods in the code TeraChem to investigate large-scale quantum mechanical features in applications ranging from protein structure to mechanochemical depolymerization. In each case, large-scale and rapid evaluation of electronic structure properties is critical for unearthing previously poorly understood properties and mechanistic features of these systems. Professor Kulik also discusses outstanding challenges in the use of Gaussian localized-basis-set codes on GPUs pertaining to limitations in basis set size and how she circumvents such challenges to computational efficiency with systematic, physics-based error corrections to basis set incompleteness
Slides by VMD lead developer Mr. John Stone, a pioneer in the field of MD Visualization. Visualization is essential to unlocking key insights from the results of MD simulations. Mr. Stone explains the many GPU-accelerated features of VMD. You can learn how these features can help you speed up a wide range of simulation preparation, analyses, and visualization tasks.
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Introduction to SeqAn, an Open-source C++ Template LibraryCan Ozdoruk
SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs. Go through the slides to learn more. For your own BI development you can try GPUs for free here: www.Nvidia.com/GPUTestDrive
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMDCan Ozdoruk
Computational scientists at the University of Illinois at Urbana–Champaign and the University of Pittsburg have now resolved the HIV capsid's chemical structure. As reported recently on the cover of Nature, the researchers combined NMR structure analysis, electron microscopy and data-guided molecular dynamics simulations utilizing VMD to prepare and analyze simulations performed using NAMD on NVIDIA GPUs in one of the most powerful computers worldwide, Blue Waters, to obtain and characterize the HIV-1 capsid. The discovery can now guide the design of novel drugs for enhanced antiviral therapy.Also learn how NAMD performs with the latest Kepler GPUs, as well as details about GPU Test Drive (www.nvidia.com/GPUTestDrive) and how to try NAMD on Kepler GPUs for free.
ACEMD: High-throughput Molecular Dynamics with NVIDIA Kepler GPUsCan Ozdoruk
Acellera Founder Gianni De Fabritiis, and CTO Matt Harvey talk about the latest developments of high-throughput molecular dynamics both in terms of applications and methodological advances. Examples are in the context of ACEMD, a highly efficient, best-in-class graphical processing units (GPUs) centric code for running MD simulations, and its protocols. In particular, attendees will learn how the high arithmetic performance and intrinsic parallelism of the latest NVIDIA Kepler GPUs can offer a technological edge for molecular dynamics simulations. Try GPUs for free via: www.Nvidia.com/GPUTestDrive
This webinar showcases the latest GPU-acceleration technologies available to AMBER users and discusses features, recent updates and future plans. Go through the sides to learn how to obtain the latest accelerated versions of AMBER, which features are supported, the simplicity of its installation and use, and how it performs with Kepler GPUs. To run AMBER free on GPUs register here: www.Nvidia.com/GPUTestDrive
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
2. Tesla K40
FASTER
1.4 TF| 2880 Cores | 288
GB/s
ns/day
5
LARGER
2x Memory Enables More
Apps
SMARTER
Unlock Extra Performance
Using Power Headroom
AMBER Benchmark
4
6GB
3
2
Fluid
Dynamics
Rendering
Seismic
Analysis
1
0
CPU
K20X
K40
12GB
AMBER Benchmark: SPFP-Nucleosome
CPU: Dual E5-2687W @ 3.10GHz, 64GB System Memory, CentOS 6.2, GPU systems: Single Tesla K20X or Single Tesla K40
3. Tesla K40 : Acceleration for Large Problems
Structural Mechanics
ANSYS 14 SMP-V14sp-4
ANSYS
1.66x
Physics
Chroma
CHROMA
8.12x
Molecular Dynamics
AMBER
AMBER - SPFP-Cellulose_production_NPT
8.67x
Material Science
QMCPACK 4x4x1
QMCPACK
10.23x
Earth Science
SPECFEM3D
SPECFEM3D
E5-2687W @ 3.10GHz
Tesla K20X
Tesla K40
10.29x
0
2
4
6
8
10
12
3
4. Bigger Challenges – Less Time
CFD
Neural Networks
Larger models, higher
throughput
Larger training sets
High Energy Physics
Graph Analytics
More advanced event triggers
Accelerate larger graphs
Material Science
M&E
Larger ion/electron systems
More complex scenes,
Accelerates color grading
Molecular Dynamics
Quantum Chemistry
Bioinformatics
Larger problems, more
acceleration
Larger problems, more
acceleration
Newer algorithms, apps
5. Tesla K40 in Media and Entertianment
Creation
Color grading
for film and
video
Distribution
3D Rendering
Video Frame
Rate Conversion
Transcoding and
Encoding
broadcast video
6. Tesla K40 : Interactive and Real-time Analysis
1 Billion Tweets
8 Tesla K40
Live Streaming and Analysis
Faster Decisions
To Learn more: Register for
map-D Webinar on 29th Jan @ 9am PST
7.
8. Power Envelope
Board Power (Watts)
Avg GPU Power in Watts for
Real Applications on K20X
200
235W
150
100
50
0
Power headroom to higher Performance
4
9. GPU Boost on Tesla K40
Convert Power Headroom to Higher Performance
Boost
Clock #2
875Mhz
Boost
Clock #1
810Mhz
Base
Clock
745Mhz
235W
Workload # 1
Worst case
Reference App
235W
Workload # 2
E.g. AMBER
235W
Workload # 3
E.g. ANSYS Fluent
5
10. Real Apps Run Up to 1.4x faster with GPU Boost
1.6
Tesla K40 Performance Relative to Tesla K20X
1.40
1.4
1.27
1.27
1.23
1.28
1.26
1.25
1.2
1.07
1.0
0.8
0.6
0.4
0.2
0.0
ANSYS 14 SMP-V14sp-4
ANSYS
LAMMPS-EAM
LAMMPS-EAM
NAMD 2.9-APOA1
NAMD 2.9
APOA1
AMBER-SPFP-Nucleosome
AMBER-SPFPNucleosome
K20X
K40@base
LSMS-Fe32
LSMS-Fe32
QMCPACK 3x3x1
QMCPACK
3x3x1
CUBLAS
CUBLAS
DGEMM
DGEMM
K40 @ boost
6
11. Compute Workload Behavior with GPU Boost
Non-Tesla
Tesla K40
Boost Clock # 2
Boost Clock # 1
GPU
Clock
Base Clock # 1
Automatic clock switching
Deterministic Clocks
Default @
Shipping
Boost
Base
Preset Options
Lock to base clock
3 Levels: Base, Boost1 or Boost2
Boost Interface
Control Panel
NV-SMI, NVML
Target duration
for boost clocks
~50% of run-time
100% of workload run time
Must-have for HPC workload
12. Using GPU Boost on Tesla K40
View
the
clocks
nvidia-smi -q –d
CLOCK,SUPPORTED_CLOCKS
Set the
Boost
clocks
nvidia-smi -ac <MEM clock,
Graphics clock>
End User selects the
clocks
Host
GPU
Boost all 2880
Cores
GPU
GPU
Higher memory b/w
13. Customer Feedback on K40 w/GPU Boost
http://www.eyesopen.com/
fastrocs
17% Faster
13% Faster
http://blog.xcelerit.com/
benchmarks-nvidia-tesla-k40-vsk20x-gpu/
11% Faster
K40 w/GPU Boost 40% higher perf
*Tesla K40 Performance Relative to Tesla K20X
14. Tesla Resources
! Want to know more about Tesla Products
http://www.nvidia.com/object/tesla-servers.html
http://www.nvidia.com/object/tesla-workstations.html
! Need help on using GPU Boost on Tesla K40
http://www.nvidia.com/object/tesla_product_literature.html
! Product details, specs, etc.
http://www.nvidia.com/object/tesla_product_literature.html
! Where to buy
http://www.nvidia.com/object/where-to-buy-tesla.html
15. Test Drive the World’s Fastest GPU
1. Sign up for FREE GPU Test Drive
visit: http://www.Nvidia.com/GPUTestDrive
2. Accelerate your apps on latest K40
GPUs
3. Tell us how K40 and GPU Boost
worked for you
16. Upcoming GTC Express Webinars
January 29: map-D: A GPU Database for Real-time Dig Data
Analytics and Interactive Visualization
January 30: Debugging CUDA Fortran using Allinea DDT
February 5: OpenMM - Accelerating and Customizing Molecular
Dynamic Simulations on GPUs
February 25: Using GPUs to Supercharge Visualization and Analysis
of Molecular Dynamics Simulations with VMD
Register at www.gputechconf.com/gtcexpress
17. GTC 2014 Registration is Open
Hundreds of sessions in the areas of
§ Science and research
§ Professional graphics
§ Mobile computing
§ Automotive applications
§ Game development
§ Cloud computing
Register with GM20EXP for 20% discount
www.gputechconf.com