Highlighted notes of:
Introduction to CUDA C: NVIDIA
Author: Blaise Barney
From: GPU Clusters, Lawrence Livermore National Laboratory
https://computing.llnl.gov/tutorials/linux_clusters/gpu/NVIDIA.Introduction_to_CUDA_C.1.pdf
Blaise Barney is a research scientist at Lawrence Livermore National Laboratory.
Presentation I gave at the SORT Conference in 2011. Was generalized from some work I had done with using GPUs to accelerate image processing at FamilySearch.
Building and operating HPC-based AI computing environment inside Gwangju Institute of Science and Technology
For using the part of the slide, you need to cite "Narantuya Jargalsaikhan, GIST AI-X Computing Cluster, 2021".
Thank you!
Shows how a device maker can extend Android to support new devices, while preserving Android compatibility. We demonstrate a joystick & an industrial barcode scanner.
Highlighted notes of:
Introduction to CUDA C: NVIDIA
Author: Blaise Barney
From: GPU Clusters, Lawrence Livermore National Laboratory
https://computing.llnl.gov/tutorials/linux_clusters/gpu/NVIDIA.Introduction_to_CUDA_C.1.pdf
Blaise Barney is a research scientist at Lawrence Livermore National Laboratory.
Presentation I gave at the SORT Conference in 2011. Was generalized from some work I had done with using GPUs to accelerate image processing at FamilySearch.
Building and operating HPC-based AI computing environment inside Gwangju Institute of Science and Technology
For using the part of the slide, you need to cite "Narantuya Jargalsaikhan, GIST AI-X Computing Cluster, 2021".
Thank you!
Shows how a device maker can extend Android to support new devices, while preserving Android compatibility. We demonstrate a joystick & an industrial barcode scanner.
Il mercato pubblicitario in un contesto postmodernopginzaina
Università degli Studi di Roma "La Sapienza"
Il Mercato Pubblicitario in un contesto postmoderno. Consumatori ed aziende in movimento
Laurea Specialistica in Organizzazione e Marketing per la Comunicazione d'impresa
Cattedra di Pianificazione dei media nelle strategie d'impresa
Relatore: Prof. Marco Stancati
Correlatore: Prof.
AA. 2009/2010
Introduction to GPUs for Machine LearningSri Ambati
Graphics processing units (GPUs) are becoming integral components of modern machine learning engines and platforms. These will provide an introduction to GPUs and their suitability for machine learning workloads. They also discuss enabling technologies, such as CUDA, and demonstrate GPU-accelerated machine learning with the H2O platform. These slides are targeted to machine learning practitioners new to GPUs.
Author: Wen Phan is a Senior Solutions Architect at H2O.ai. Wen works with customers and organizations to architect systems, smarter applications, and data products to make better decisions, achieve positive outcomes, and transform the way they do business. Internally, Wen uses his hard-earned field experiences, customer feedback, and market trends to drive product innovation and development. Wen holds a B.S. in Electrical Engineering and M.S. in Analytics and Decision Sciences.
Follow him on twitter: @wenphan
Monte Carlo simulation is one of the most important numerical methods in financial derivative pricing and risk management. Due to the increasing sophistication of exotic derivative models, Monte Carlo becomes the method of choice for numerical implementations because of its flexibility in high-dimensional problems. However, the method of discretization of the underlying stochastic differential equation (SDE) has a significant effect on convergence. In addition the choice of computing platform and the exploitation of parallelism offers further efficiency gains. We consider here the effect of higher order discretization methods together with the possibilities opened up by the advent of programmable graphics processing units (GPUs) on the overall performance of Monte Carlo and quasi-Monte Carlo methods.
The presentation describes how to select the NVIDIA GPU, what parameters are important and where to find them, what affects the performance of the GPU and code running on it. Today, Deep Learning experts mostly use ready frameworks, but there are situations when you need to understand how the data inside GPU is processed.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
3. What we will cover
• GPUs and their history
• Why use GPUs
• Architecture
• Getting Started with GPU Programming
• Challenges, Techniques & Pitfalls
• Where not to use GPUs ?
• Resources
• The Future
4. What is a GPU
• Graphics Programming Unit
– Coined in 1999 by NVidia
– Specialized add‐on board
• Accelerates interactive 3D rendering
– 60 image updates (or more) on large data
– Solves embarrassingly parallel problem
– Game driven volume economics
• NVidia v/s ATI, just like Intel v/s AMD
• Demand for better effects led to
– programmable GPUs
– floating point capabilities
– this led to General Purpose GPU(GPGPU) Computation
5. History of GPUs : a GPGPU Perspective
Date Product Trans Cores Flops Technology
1997 RIVA 128 3 M Rasterization
1999 GeForce 256 25 M Transform & Lighting
2001 GeForce 3 60 M Programmable shaders
2002 GeForce FX 125 M 16, 32 bit FP, long shaders
2004 GeForce 6800 222 M Infinite length shaders, branching
2006 GeForce 8800 681 M 128 Unified graphics & compute, CUDA,
64 bit FP
2008 GeForce GTX 1.4 B 240 933 G IEEE FP, CUDA C, OpenCL and
280 78 M DirectCompute, PCI‐express Gen 2
2009 Tesla M2050 3.0 B 512 1.03 T Improved 64 bit perf, caching, ECC
515 G memory, 64‐bit unified addressing,
asynchronous bidirectional data
transfer, multiple kernels
Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
6. The GPU Advantage
30x CPU FLOPS on Latest GPUs 10x Memory Bandwidth
Add to these a
3x Performance/$
Energy Efficient : 5x Performance/Watt
All Graphs From: GPU4Vision : http://gpu4vision.icg.tugrz.at/
7. People use GPUs for…
Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010
8. More “why to use GPUs”
• Proliferation of GPUs
– Mobile devices will have capable GPUs soon !
• Make more things possible
– Make things real‐time
• From seconds to real‐time interactive performance
– Reduce offline processing overhead
• Research Opportunities
– New & efficient algorithms
– Pairing Multi‐core CPUs and massively multi‐threaded
GPUs
12. CPU versus GPU
• CPU
– Optimized for latency
– Speedup techniques
• Vectorization (MMX, SSE, …)
• Coarse Grained Parallelism using multiple CPUs and cores
– Memory approaching a TB
• GPU
– Optimized for throughput
– Speedup techniques
• Massive multithreading
• Fine grained parallelism
– A few GBs of memory max
13. Getting Started
• Software
– CUDA (NVidia specific)
– OpenCL (Cross‐platform, GPU/CPU)
– DirectCompute (MS specific)
• Hardware
– A system equipped with GPU
• OS no bar
– But Windows, RedHat Enterprise Linux seem better
supported
14. CUDA
• Compute Unified Device
Architecture
• Most popular GPGPU toolkit
• CUDA C extends C with
constructs
– Easy to write programs
• Lower level “driver” API is
available
Source: NVIDIA CUDA Architecture, Introduction and Overview
– Provides more control
– Use multiple GPUs in the same
application
– Mix graphics & compute code
• Language bindings available
– PyCUDA, Java, .NET
• Toolkit provides conveniences
CUDA Toolkit
15. CUDA Architecture
• 1 more streaming
multiprocessors (“cores”)
• Thread Blocks
– Single Instruction, Multiple
Thread (SIMT)
– Hide latency by parallelism
• Memory Hierarchy
– Fermi GPUs can access
system memory
• Primitives for
– Thread synchronization
– Atomic Operations on
memory
Source : The GPU Computing Era
16. Simple Example : Vector Addition
C/C++ ‐ serial code
void VecAdd(const float *A, const float*B, float *C, int N) {
for(unsigned int i=0;i<N;i++)
C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);
C/C++ with OpenMP – thread level parallelism
void VecAdd(const float *A, const float*B, float *C, int N) {
#pragma omp for
for(unsigned int i=0;i<N;i++)
C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);
17. Vector Addition using CUDA
CUDA C – element level parallelism
__global__ void VecAdd(const float *A, const float*B, float *C, int N) {
int I = blockDim.x * blockIdx.x + threadIdx.x;
if(i<N)
C[i]=A[i]+B[i];
}
Invoking the function
cudaMalloc((void**)&d_A, size);
Allocate Memory on GPU
cudaMalloc((void**)&d_B, size);
cudaMalloc((void**)&d_C, size);
cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice); Copy Arrays to GPU
cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice);
int threadsPerBlock = 256;
int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; Invoke function
VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost);
Copy Result Back to Main Memory
cudaFree(d_A);
cudaFree(d_B);
Free GPU Memory
cudaFree(d_C);
Compilation
# nvcc vectorAdd.cu –I ../../common/inc
18. GPU Programming Challenges
• Need high “occupancy” for best performance
• Extracting parallelism with limited resources
– Limited Registers
– Limited Shared Memory
• Preferred Approach
– Small Kernels
– Multiple Passes if needed
• Decompose Problem into Parallel Pieces
– Write once, scale perform everywhere!
19. GPU Programming
• Use Shared Memory when possible
– Cooperation between threads in a block
– Reduce access to global memory
• Reduce Data Transfer over the Bus
• It’s still a GPU !
– use textures to your advantage
– use vector data types if you can
• Watch out for GPU capability differences!
22. Resources
• CUDA
– Tools on NVIDIA Developer Site
http://developer.nvidia.com/object/gpucomputing.html
– CUDPP
http://code.google.com/p/cudpp/
• OpenCL
• Google Search !
23. The Future
• Better throughput
– More GPU cores, scaling by Moore’s law
– PCIe Gen 3
• Easier to program
• Arbitrary control and data access patterns