The document discusses GPU programming and mapping the SMAC atmospheric correction application to a GPU. It begins with an overview of NVIDIA GPU hardware and CUDA programming concepts. It then describes mapping the SMAC application, which processes large satellite image datasets, to run its correction algorithm on a GPU. Experiments show the GPU implementation significantly reduces processing time compared to CPU. Optimization techniques for GPU programming are also covered.
GPUs have evolved from graphics cards to platforms for general purpose high performance computing. CUDA is a programming model that allows GPUs to execute programs written in C for general computing tasks using a single-instruction multiple-thread model. A basic CUDA program involves allocating memory on the GPU, copying data to the GPU, launching a kernel function that executes in parallel across threads on the GPU, copying results back to the CPU, and freeing GPU memory.
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
These slides have been presented by Dr. Alessandro Vallero at the IEEE VLSI Test Symposium, San Francisco, CA, USA (April 22-25, 2018).
General Purpose computing on Graphics Processing Unit offers a remarkable speedup for data parallel workloads, leveraging GPUs computational power. However, differently from graphic computing, it requires highly reliable operation in most of application domains.
This presentation talk about a “Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs“. The work is the outcome of a collaboration between the TestGroup of Politecnico di Torino (http://www.testgroup.polito.it) and the Computer Architecture Lab of the University of Athens (dscal.di.uoa.gr) started under the FP7 Clereco Project (http://www.clereco.eu). It presents an extended study based on a consolidated workflow for the evaluation of the reliability in correlation with the performance of four GPU architectures and corresponding chips: AMD Southern Islands and NVIDIA G80/GT200/Fermi. We obtained reliability measurements (AVF and FIT) employing both fault injection and ACE-analysis based on microarchitecture-level simulators. Apart from the reliability-only and performance-only measurements, we propose combined metrics for performance and reliability (to quantify instruction throughput or task execution throughput between failures) that assist comparisons for the same application among GPU chips of different ISAs and vendors, as well as among benchmarks on the same GPU chip.
Watch the presentation at: https://youtu.be/GV5xRDgfCw4
Paper Information:
Alessandro Vallero§ , Sotiris Tselonis, Dimitris Gizopoulos* and Stefano Di Carlo§, “Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs”, IEEE VLSI Test Symposium 2018 (VTS 2018), San Francisco, CA (USA), April 22-25, 2018.
∗Politecnico di Torino, Italy. Email: stefano.dicarlo,alessandro.vallero@polito.it †University of Athens, Greece Email: dgizop@di.uoa.gr
Aix student guide system administrations part 2 problem determinationYogesh Sharma
This document provides an overview of the AIX 5L System Administration II: Problem Determination course. It covers topics like problem determination techniques, the IBM pSeries product family, the Object Data Manager (ODM), system initialization, and solving boot problems. The document contains course objectives, descriptions of course content, and references to additional documentation. It is intended as a student notebook for an IBM Certified training course on advanced AIX system administration and problem determination skills.
This document provides a tutorial introduction to GPGPU computation using NVIDIA CUDA. It begins with a brief overview and warnings about the large numbers involved in GPGPU. The agenda then outlines topics to be covered including general purpose GPU computing using CUDA and optimization topics like memory bandwidth optimization. Key aspects of CUDA programming are introduced like the CUDA memory model, compute capabilities of GPUs, and profiling tools. Examples are provided of simple CUDA kernels and how to configure kernel launches for grids and blocks of threads. Optimization techniques like choosing block/grid sizes to maximize occupancy are also discussed.
Architecture exploration of recent GPUs to analyze the efficiency of hardware...journalBEEI
This document analyzes the efficiency of hardware resources in recent GPU architectures like Pascal compared to older architectures like Fermi. It simulates 9 benchmarks on a Fermi and Pascal-based GPU configuration using a cycle-accurate simulator. The results show that Pascal improves performance by 273% on average over Fermi. It also analyzes the impact of computing resources versus memory resources, varying the number of warp schedulers, and measuring barrier synchronization overhead. The goal is to understand how hardware upgrades in newer architectures translate to performance gains and guide future GPU development.
This document summarizes an Italian presentation on monitoring and tuning I/O performance on Linux. It discusses key topics like I/O monitoring tools like iostat, iotop and blktrace, tuning techniques like dirty page writeback and filesystem options, and ensuring reliability of data writes through proper synchronization and error handling. The presentation provides an overview of I/O subsystems in Linux and dives into specific tools and parameters for optimizing I/O.
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
GPUs have evolved from graphics cards to platforms for general purpose high performance computing. CUDA is a programming model that allows GPUs to execute programs written in C for general computing tasks using a single-instruction multiple-thread model. A basic CUDA program involves allocating memory on the GPU, copying data to the GPU, launching a kernel function that executes in parallel across threads on the GPU, copying results back to the CPU, and freeing GPU memory.
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
These slides have been presented by Dr. Alessandro Vallero at the IEEE VLSI Test Symposium, San Francisco, CA, USA (April 22-25, 2018).
General Purpose computing on Graphics Processing Unit offers a remarkable speedup for data parallel workloads, leveraging GPUs computational power. However, differently from graphic computing, it requires highly reliable operation in most of application domains.
This presentation talk about a “Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs“. The work is the outcome of a collaboration between the TestGroup of Politecnico di Torino (http://www.testgroup.polito.it) and the Computer Architecture Lab of the University of Athens (dscal.di.uoa.gr) started under the FP7 Clereco Project (http://www.clereco.eu). It presents an extended study based on a consolidated workflow for the evaluation of the reliability in correlation with the performance of four GPU architectures and corresponding chips: AMD Southern Islands and NVIDIA G80/GT200/Fermi. We obtained reliability measurements (AVF and FIT) employing both fault injection and ACE-analysis based on microarchitecture-level simulators. Apart from the reliability-only and performance-only measurements, we propose combined metrics for performance and reliability (to quantify instruction throughput or task execution throughput between failures) that assist comparisons for the same application among GPU chips of different ISAs and vendors, as well as among benchmarks on the same GPU chip.
Watch the presentation at: https://youtu.be/GV5xRDgfCw4
Paper Information:
Alessandro Vallero§ , Sotiris Tselonis, Dimitris Gizopoulos* and Stefano Di Carlo§, “Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs”, IEEE VLSI Test Symposium 2018 (VTS 2018), San Francisco, CA (USA), April 22-25, 2018.
∗Politecnico di Torino, Italy. Email: stefano.dicarlo,alessandro.vallero@polito.it †University of Athens, Greece Email: dgizop@di.uoa.gr
Aix student guide system administrations part 2 problem determinationYogesh Sharma
This document provides an overview of the AIX 5L System Administration II: Problem Determination course. It covers topics like problem determination techniques, the IBM pSeries product family, the Object Data Manager (ODM), system initialization, and solving boot problems. The document contains course objectives, descriptions of course content, and references to additional documentation. It is intended as a student notebook for an IBM Certified training course on advanced AIX system administration and problem determination skills.
This document provides a tutorial introduction to GPGPU computation using NVIDIA CUDA. It begins with a brief overview and warnings about the large numbers involved in GPGPU. The agenda then outlines topics to be covered including general purpose GPU computing using CUDA and optimization topics like memory bandwidth optimization. Key aspects of CUDA programming are introduced like the CUDA memory model, compute capabilities of GPUs, and profiling tools. Examples are provided of simple CUDA kernels and how to configure kernel launches for grids and blocks of threads. Optimization techniques like choosing block/grid sizes to maximize occupancy are also discussed.
Architecture exploration of recent GPUs to analyze the efficiency of hardware...journalBEEI
This document analyzes the efficiency of hardware resources in recent GPU architectures like Pascal compared to older architectures like Fermi. It simulates 9 benchmarks on a Fermi and Pascal-based GPU configuration using a cycle-accurate simulator. The results show that Pascal improves performance by 273% on average over Fermi. It also analyzes the impact of computing resources versus memory resources, varying the number of warp schedulers, and measuring barrier synchronization overhead. The goal is to understand how hardware upgrades in newer architectures translate to performance gains and guide future GPU development.
This document summarizes an Italian presentation on monitoring and tuning I/O performance on Linux. It discusses key topics like I/O monitoring tools like iostat, iotop and blktrace, tuning techniques like dirty page writeback and filesystem options, and ensuring reliability of data writes through proper synchronization and error handling. The presentation provides an overview of I/O subsystems in Linux and dives into specific tools and parameters for optimizing I/O.
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
GPU HPC Clusters document discusses GPU cluster research at NCSA including early GPU clusters like QP and Lincoln, follow-up clusters like AC that expanded GPU resources, and eco-friendly cluster EcoG. It describes ISL research in GPU and heterogeneous computing including systems software, runtimes, tools and application development.
The document provides a history of GPUs and GPGPU computing. It describes how GPUs evolved from fixed hardware for graphics to programmable hardware. This allowed general purpose computing on GPUs (GPGPU). It discusses the development of GPGPU languages and APIs like CUDA, OpenCL, and DirectCompute. The anatomy of a modern GPU is explained, highlighting its massively parallel architecture. Typical GPGPU execution and memory models are outlined. Usage of GPGPU for applications like graphics, physics, computer vision, and HPC is mentioned. Leading GPU vendors and their products are briefly introduced.
GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.
Computing Performance: On the Horizon (2021)Brendan Gregg
Talk by Brendan Gregg for USENIX LISA 2021. https://www.youtube.com/watch?v=5nN1wjA_S30 . "The future of computer performance involves clouds with hardware hypervisors and custom processors, servers running a new type of BPF software to allow high-speed applications and kernel customizations, observability of everything in production, new Linux kernel technologies, and more. This talk covers interesting developments in systems and computing performance, their challenges, and where things are headed."
This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
This document discusses optimizing Linux boot times on the Raspberry Pi. It begins with an overview of generic boot optimization concepts like identifying and measuring boot components, removing unnecessary functionality, and reordering initialization. It then presents a case study of optimizing boot for Raspbian on the Raspberry Pi through techniques like disabling unneeded services, assigning a static IP, using a minimal custom distro, and kernel optimizations like disabling initcalls and reducing the kernel size. The goal is to achieve an SSH login within 25 seconds instead of the original 30 seconds.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
This document summarizes a survey on GPU systems and their performance on different applications. It discusses how GPUs can be used for general purpose computing due to their high parallel processing capabilities. Several computational intensive applications that achieve speedups when implemented on GPUs are described, including video decoding, matrix multiplication, parallel AES encryption, and password recovery for MS office documents. The GPU architecture and Nvidia's CUDA programming model are also summarized. While GPUs provide significant performance benefits, some limitations for non-graphics applications are noted. The conclusion is that GPUs are a good alternative for computational intensive tasks to reduce CPU load and improve performance compared to CPU-only implementations.
This project deals with the warehouse scale computers that power all the internet services which we use today. The project covers the hardware blocks used in a Google WSC. Also, the project deals with the architecture of hardware accelerators such as the Graphical Processing Unit and the Tensor Processing Unit, which is highly useful for the warehouse scale machines to run heavy tasks and also to support application-specific machine learning and deep learning tasks. Also, the project explains about the energy efficiency of the processors used by the Google WSC to achieve high performance. The project also tries to explain about performance enhancement mechanism used by Google WSC.
This document provides an overview of Disksim, an open source disk simulator, and its extension to simulate SSDs.
Disksim implements disk simulation using an event-based control flow where events are processed sequentially from a global queue. It provides device models for disk and simpledisk, and the SSD extension adds support for simulating NAND flash characteristics.
The SSD extension models aspects like logical block mapping, garbage collection with both greedy cleaning and wear leveling, and timing parameters. It introduces parallelism by organizing flash packages into gangs or individual elements that can process requests concurrently while serializing completions.
In 2 sentences or less: This document discusses Disksim, an open source disk simulator, and its extension
SMAC (Social, Mobile, Analytics, Cloud) trends are increasingly being embraced by the HR industry. HR functions like recruiting, employee engagement, and analytics are being transformed by adopting cloud-based SaaS systems, mobile apps, social platforms, and analytics tools. This allows HR to improve processes like benefits administration, payroll, and collaboration while also providing data insights. Major tech vendors are seeing over 50% annual growth for their cloud-based HR offerings. Companies like Cognizant have seen productivity gains from employee use of social tools like Yammer on their own devices. The adoption of SMAC technologies in HR continues to accelerate.
Overcoming the Commodity Management Challenges in Metals & Mining Eka Software Solutions
Metals and mining companies today face several challenges in risk management, reporting, and operations.
In this webinar, industry expert Simon Reid, Everis, and Eka cover these topics:
- Best practices for creating effective hedging strategies that mitigate the effects of price volatility and energy costs
- The importance of managing both physical contracts and derivatives all in one advanced software platform
- Solutions for calculating an accurate metal balance in real-time
- How to maximize throughput with Smart CM for competitive advantage
Download webinar recording: http://info.ekaplus.com/metals-mining-webinar
SMACology i.e. SMAC Technology is the new buzzword reforming the IT industries as well as the skills of technical aspirants. Learn how.
PDF courtesy: KPMG
This document provides an introduction to Cloudant, which is a fully managed NoSQL database as a service (DBaaS) that provides a scalable and flexible data layer for web and mobile applications. The presentation discusses NoSQL databases and why they are useful, describes Cloudant's features such as document storage, querying, indexing and its global data presence. It also provides examples of how companies like FitnessKeeper and Fidelity Investments use Cloudant to solve data scaling and management challenges. The document concludes by outlining next steps for signing up and exploring Cloudant.
I am applied at @MphasisCareers for the post Intern and the topic for the competition is Social, Mobile, Analytics and Cloud computing (SMAC). Read through my presentation, rate it and comment it if possible. I need all of your support.
Thank you
Talking SMAC !!! How Social, Mobile, Analytics & Cloud Reshaping Your Busines...C.K. Kumar
The document discusses how social, mobile, analytics, and cloud (SMAC) technologies are reshaping businesses and marketing. It outlines the key benefits of SMAC including productivity anywhere, data-driven decision making, and real-time interactions. The document then discusses how SMAC is impacting business strategy through accelerated and always-on access, empowering customers and employees, and connecting the workforce. For marketing strategy, SMAC enables accelerated lead generation, customized interactions through data and personalization, integrated online and offline experiences, and synchronization across channels. The document concludes that SMAC brings more change in the next 3 years than the past 50 and that opportunities lie at the interfaces between technologies.
The document discusses how emerging technologies like cloud, mobile, big data, and social media (SMAC) are enabling new applications and use cases. These technologies are transforming the way people and businesses use technology. Examples mentioned include accessing internet anywhere using mobile technology, analyzing large amounts of customer and social data using big data analytics to customize experiences, and allowing remote collaboration using cloud-based social media platforms. The document argues that spending on SMAC technologies will grow significantly and drive digital transformation across industries by 2020.
Real Time Analytics for Big Data a Twitter Case StudyNati Shalom
Hadoop's batch-oriented processing is sufficient for many use cases, especially where the frequency of data reporting doesn't need to be up-to-the-minute. However, batch processing isn't always adequate, particularly when serving online needs such as mobile and web clients, or markets with real-time changing conditions such as finance and advertising.
In the same way that Hadoop was born out of large-scale web applications, a new class of scalable frameworks and platforms for handling real time streaming processing or real time analysis is born to handle the needs of large-scale location-aware mobile, social and sensor use.
Facebook, Twitter and Google have been pioneers in that arena and recently launched new analytics services designed to meet the real time needs.
In this session we will review the common patterns and architectures that drive these platforms and learn how to build a Twitter-like analytics system in a simple way using frameworks such as Spring Social, Active In-Memory Data Grid for Big Data event processing, and NoSQL database such as Cassandra or HBase for handling the managing the historical data.
Participants in this session will also receive a hands-on tutorial for trying out these patterns on their own environment.
A detailed post covering the topic including a reference to a code example illustrating the reference architecture is available below:
http://horovits.wordpress.com/2012/01/27/analytics-for-big-data-venturing-with-the-twitter-use-case/
MDEC Fintech Conference - Demystifying Fintech in the SMAC Era, Darien Nagle ...iTrain
Check out the video of this presentation and the rest at www.itrain.com.my/fintech-bootcamp
Interested to get a fintech idea started but don't know how to start? Then join the FREE MDEC Fintech Masterclass on October 3-4. To enter just tell us about your Fintech idea!
Apply here: bit.ly/fintech-master
More information about the complete Fintech Bootcamp: www.itrain.com.my/fintech-bootcamp/
SMAC is upsetting the domain. No CIO dialogue is accomplished devoid of considering influence
of SMAC on industry and business. Rapid developments in this technology pile are accumulating
value to complete breadth of businesses and industries. Rewards are several and appear very
captivating, with assurances being made as big as - forecasting future (Analytics), accessible
everywhere (Mobile), everything is so easy and networked (Social), and at a very low cost (Cloud).
This fresh technology pile has begun changing tomorrow's organization and has influence on every
part of a business, therefore consequently on the every software applications being utilize inside
the company and by the company.
The document provides an overview of big data concepts including definitions, statistics on data generation and internet usage, applications and examples, challenges, and data types. It discusses key big data concepts such as the 3Vs of volume, velocity and variety; more Vs including veracity, value and visualization; data science areas and skills; the data workflow; and examples from companies like UPS, Walmart, eBay, and Kaiser Permanente.
GPU HPC Clusters document discusses GPU cluster research at NCSA including early GPU clusters like QP and Lincoln, follow-up clusters like AC that expanded GPU resources, and eco-friendly cluster EcoG. It describes ISL research in GPU and heterogeneous computing including systems software, runtimes, tools and application development.
The document provides a history of GPUs and GPGPU computing. It describes how GPUs evolved from fixed hardware for graphics to programmable hardware. This allowed general purpose computing on GPUs (GPGPU). It discusses the development of GPGPU languages and APIs like CUDA, OpenCL, and DirectCompute. The anatomy of a modern GPU is explained, highlighting its massively parallel architecture. Typical GPGPU execution and memory models are outlined. Usage of GPGPU for applications like graphics, physics, computer vision, and HPC is mentioned. Leading GPU vendors and their products are briefly introduced.
GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.
Computing Performance: On the Horizon (2021)Brendan Gregg
Talk by Brendan Gregg for USENIX LISA 2021. https://www.youtube.com/watch?v=5nN1wjA_S30 . "The future of computer performance involves clouds with hardware hypervisors and custom processors, servers running a new type of BPF software to allow high-speed applications and kernel customizations, observability of everything in production, new Linux kernel technologies, and more. This talk covers interesting developments in systems and computing performance, their challenges, and where things are headed."
This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
This document discusses optimizing Linux boot times on the Raspberry Pi. It begins with an overview of generic boot optimization concepts like identifying and measuring boot components, removing unnecessary functionality, and reordering initialization. It then presents a case study of optimizing boot for Raspbian on the Raspberry Pi through techniques like disabling unneeded services, assigning a static IP, using a minimal custom distro, and kernel optimizations like disabling initcalls and reducing the kernel size. The goal is to achieve an SSH login within 25 seconds instead of the original 30 seconds.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
This document summarizes a survey on GPU systems and their performance on different applications. It discusses how GPUs can be used for general purpose computing due to their high parallel processing capabilities. Several computational intensive applications that achieve speedups when implemented on GPUs are described, including video decoding, matrix multiplication, parallel AES encryption, and password recovery for MS office documents. The GPU architecture and Nvidia's CUDA programming model are also summarized. While GPUs provide significant performance benefits, some limitations for non-graphics applications are noted. The conclusion is that GPUs are a good alternative for computational intensive tasks to reduce CPU load and improve performance compared to CPU-only implementations.
This project deals with the warehouse scale computers that power all the internet services which we use today. The project covers the hardware blocks used in a Google WSC. Also, the project deals with the architecture of hardware accelerators such as the Graphical Processing Unit and the Tensor Processing Unit, which is highly useful for the warehouse scale machines to run heavy tasks and also to support application-specific machine learning and deep learning tasks. Also, the project explains about the energy efficiency of the processors used by the Google WSC to achieve high performance. The project also tries to explain about performance enhancement mechanism used by Google WSC.
This document provides an overview of Disksim, an open source disk simulator, and its extension to simulate SSDs.
Disksim implements disk simulation using an event-based control flow where events are processed sequentially from a global queue. It provides device models for disk and simpledisk, and the SSD extension adds support for simulating NAND flash characteristics.
The SSD extension models aspects like logical block mapping, garbage collection with both greedy cleaning and wear leveling, and timing parameters. It introduces parallelism by organizing flash packages into gangs or individual elements that can process requests concurrently while serializing completions.
In 2 sentences or less: This document discusses Disksim, an open source disk simulator, and its extension
SMAC (Social, Mobile, Analytics, Cloud) trends are increasingly being embraced by the HR industry. HR functions like recruiting, employee engagement, and analytics are being transformed by adopting cloud-based SaaS systems, mobile apps, social platforms, and analytics tools. This allows HR to improve processes like benefits administration, payroll, and collaboration while also providing data insights. Major tech vendors are seeing over 50% annual growth for their cloud-based HR offerings. Companies like Cognizant have seen productivity gains from employee use of social tools like Yammer on their own devices. The adoption of SMAC technologies in HR continues to accelerate.
Overcoming the Commodity Management Challenges in Metals & Mining Eka Software Solutions
Metals and mining companies today face several challenges in risk management, reporting, and operations.
In this webinar, industry expert Simon Reid, Everis, and Eka cover these topics:
- Best practices for creating effective hedging strategies that mitigate the effects of price volatility and energy costs
- The importance of managing both physical contracts and derivatives all in one advanced software platform
- Solutions for calculating an accurate metal balance in real-time
- How to maximize throughput with Smart CM for competitive advantage
Download webinar recording: http://info.ekaplus.com/metals-mining-webinar
SMACology i.e. SMAC Technology is the new buzzword reforming the IT industries as well as the skills of technical aspirants. Learn how.
PDF courtesy: KPMG
This document provides an introduction to Cloudant, which is a fully managed NoSQL database as a service (DBaaS) that provides a scalable and flexible data layer for web and mobile applications. The presentation discusses NoSQL databases and why they are useful, describes Cloudant's features such as document storage, querying, indexing and its global data presence. It also provides examples of how companies like FitnessKeeper and Fidelity Investments use Cloudant to solve data scaling and management challenges. The document concludes by outlining next steps for signing up and exploring Cloudant.
I am applied at @MphasisCareers for the post Intern and the topic for the competition is Social, Mobile, Analytics and Cloud computing (SMAC). Read through my presentation, rate it and comment it if possible. I need all of your support.
Thank you
Talking SMAC !!! How Social, Mobile, Analytics & Cloud Reshaping Your Busines...C.K. Kumar
The document discusses how social, mobile, analytics, and cloud (SMAC) technologies are reshaping businesses and marketing. It outlines the key benefits of SMAC including productivity anywhere, data-driven decision making, and real-time interactions. The document then discusses how SMAC is impacting business strategy through accelerated and always-on access, empowering customers and employees, and connecting the workforce. For marketing strategy, SMAC enables accelerated lead generation, customized interactions through data and personalization, integrated online and offline experiences, and synchronization across channels. The document concludes that SMAC brings more change in the next 3 years than the past 50 and that opportunities lie at the interfaces between technologies.
The document discusses how emerging technologies like cloud, mobile, big data, and social media (SMAC) are enabling new applications and use cases. These technologies are transforming the way people and businesses use technology. Examples mentioned include accessing internet anywhere using mobile technology, analyzing large amounts of customer and social data using big data analytics to customize experiences, and allowing remote collaboration using cloud-based social media platforms. The document argues that spending on SMAC technologies will grow significantly and drive digital transformation across industries by 2020.
Real Time Analytics for Big Data a Twitter Case StudyNati Shalom
Hadoop's batch-oriented processing is sufficient for many use cases, especially where the frequency of data reporting doesn't need to be up-to-the-minute. However, batch processing isn't always adequate, particularly when serving online needs such as mobile and web clients, or markets with real-time changing conditions such as finance and advertising.
In the same way that Hadoop was born out of large-scale web applications, a new class of scalable frameworks and platforms for handling real time streaming processing or real time analysis is born to handle the needs of large-scale location-aware mobile, social and sensor use.
Facebook, Twitter and Google have been pioneers in that arena and recently launched new analytics services designed to meet the real time needs.
In this session we will review the common patterns and architectures that drive these platforms and learn how to build a Twitter-like analytics system in a simple way using frameworks such as Spring Social, Active In-Memory Data Grid for Big Data event processing, and NoSQL database such as Cassandra or HBase for handling the managing the historical data.
Participants in this session will also receive a hands-on tutorial for trying out these patterns on their own environment.
A detailed post covering the topic including a reference to a code example illustrating the reference architecture is available below:
http://horovits.wordpress.com/2012/01/27/analytics-for-big-data-venturing-with-the-twitter-use-case/
MDEC Fintech Conference - Demystifying Fintech in the SMAC Era, Darien Nagle ...iTrain
Check out the video of this presentation and the rest at www.itrain.com.my/fintech-bootcamp
Interested to get a fintech idea started but don't know how to start? Then join the FREE MDEC Fintech Masterclass on October 3-4. To enter just tell us about your Fintech idea!
Apply here: bit.ly/fintech-master
More information about the complete Fintech Bootcamp: www.itrain.com.my/fintech-bootcamp/
SMAC is upsetting the domain. No CIO dialogue is accomplished devoid of considering influence
of SMAC on industry and business. Rapid developments in this technology pile are accumulating
value to complete breadth of businesses and industries. Rewards are several and appear very
captivating, with assurances being made as big as - forecasting future (Analytics), accessible
everywhere (Mobile), everything is so easy and networked (Social), and at a very low cost (Cloud).
This fresh technology pile has begun changing tomorrow's organization and has influence on every
part of a business, therefore consequently on the every software applications being utilize inside
the company and by the company.
The document provides an overview of big data concepts including definitions, statistics on data generation and internet usage, applications and examples, challenges, and data types. It discusses key big data concepts such as the 3Vs of volume, velocity and variety; more Vs including veracity, value and visualization; data science areas and skills; the data workflow; and examples from companies like UPS, Walmart, eBay, and Kaiser Permanente.
Why the SMAC Stack is Going to Change the Word...Or has it Already?Ayantek LLC
Our President & CEO Praveen Ramanathan delivered this presentation at Blue Wave Marketing's 'Marketing Integration Forum'. In the presentation he outlines how companies like Netflix and Analog Devices are leveraging the SMAC Stack in order to maximize their marketing value. He also provides three tools to conduct a self assessment to see where SMAC can fit into your organization's digital strategy.
Microsoft Cloud Computing - Windows Azure PlatformDavid Chou
The document provides an overview of Microsoft's cloud computing platform. It discusses Microsoft's strategy of providing a hybrid cloud that allows customers to run applications both on-premise and in the public cloud. It highlights key services offered, such as compute infrastructure (web and worker roles), SQL Azure database, storage, and AppFabric. Case studies are presented showing how various companies have used the Microsoft cloud platform.
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
This document provides an overview of CUDA (Compute Unified Device Architecture), a parallel computing platform developed by NVIDIA that allows programming of GPUs for general-purpose processing. It outlines CUDA's process flow of copying data to the GPU, running a kernel program on the GPU, and copying results back to CPU memory. It then demonstrates CUDA concepts like kernel and thread structure, memory management, and provides a code example of vector addition to illustrate CUDA programming.
This lecture discusses manycore GPU architectures and programming, focusing on the CUDA programming model. It covers GPU execution models, CUDA programming concepts like threads and blocks, and how to manage GPU memory including different memory types like global and shared memory. It also discusses optimizing memory access patterns for global memory and profiling CUDA programs.
This document provides an outline of manycore GPU architectures and programming. It introduces GPU architectures, the GPGPU concept, and CUDA programming. It discusses the GPU execution model, CUDA programming model, and how to work with different memory types in CUDA like global, shared and constant memory. It also covers streams and concurrency, CUDA intrinsics and libraries, performance profiling and debugging. Finally, it mentions directive-based programming models like OpenACC and OpenMP.
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
This document provides an overview of GPU programming with CUDA. It defines what a GPU is, that it has many compute cores for graphics processing. It explains that CUDA extends C to access GPU capabilities, allowing for parallel execution across GPU threads. It provides examples of CUDA code structure and keywords to specify where code runs and launch kernels. Performance considerations include data storage, shared memory, and efficient thread scheduling.
This document provides an introduction to accelerators such as GPUs and Intel Xeon Phi. It discusses the architecture and programming of GPUs using CUDA. GPUs are massively parallel many-core processors designed for graphics processing but now used for general purpose computing. They provide much higher floating point performance than CPUs. The document outlines GPU memory architecture and programming using CUDA. It also provides an overview of Intel Xeon Phi which contains over 50 simple CPU cores for highly parallel workloads.
This document provides an overview of CUDA (Compute Unified Device Architecture) and GPU programming. It begins with definitions of CUDA and GPU hardware architecture. The history of GPU development from basic graphics cards to modern programmable GPUs is discussed. The document then covers the CUDA programming model including the device model with multiprocessors and threads, and the execution model with grids, blocks and threads. It includes a code example to calculate squares on the GPU. Performance results are shown for different GPUs on a radix sort algorithm. The document concludes that GPU computing is powerful and will continue growing in importance for applications.
Presentation I gave at the SORT Conference in 2011. Was generalized from some work I had done with using GPUs to accelerate image processing at FamilySearch.
This document summarizes VPU and GPGPU computing technologies. It discusses that a VPU is a visual processing unit, also known as a GPU. GPUs have massively parallel architectures that allow them to perform better than CPUs for some complex computational tasks. The document then discusses GPU, PPU and GPGPU architectures, programming models like CUDA, and applications of GPGPU computing such as machine learning, robotics and scientific research.
This document summarizes VPU and GPGPU technologies. It discusses that a VPU is a visual processing unit, also known as a GPU. GPUs have massively parallel architectures that allow them to perform better than CPUs for some complex computational tasks. The document then discusses GPU architecture including stream processing, graphics pipelines, shaders, and GPU clusters. It provides an example of using CUDA for GPU computing and discusses how GPUs are used for general purpose computing through frameworks like CUDA.
This document summarizes VPU and GPGPU computing technologies. It discusses that a VPU is a visual processing unit, also known as a GPU. GPUs provide massively parallel and multithreaded processing capabilities. GPUs are now commonly used for general purpose computing due to their ability to handle complex computational tasks faster than CPUs in some cases. The document then discusses GPU and PPU architectures, programming models like CUDA, and applications of GPGPU computing such as machine learning, robotics, and scientific research.
The document provides an overview of introductory GPGPU programming with CUDA. It discusses why GPUs are useful for parallel computing applications due to their high FLOPS and memory bandwidth capabilities. It then outlines the CUDA programming model, including launching kernels on the GPU with grids and blocks of threads, and memory management between CPU and GPU. As an example, it walks through a simple matrix multiplication problem implemented on the CPU and GPU to illustrate CUDA programming concepts.
The document discusses VPU and GPGPU computing. It explains that a VPU is a visual processing unit, also known as a GPU. GPUs are massively parallel and multithreaded processors that are better than CPUs for tasks like machine learning and graphics processing. The document then discusses GPU architecture, memory, and programming models like CUDA. It provides examples of GPU usage and concludes that GPGPU is used in fields like machine learning, robotics, and scientific computing.
This document provides an overview of setting up an Intel IoT Developer Kit including the hardware components, installing software, and running sample codes. It discusses the Galileo and Edison boards, microSD cards, IDEs, MRAA and UPM libraries, and connecting devices. It also demonstrates how to set up environments for C/C++ with Eclipse, JavaScript with XDK, and Arduino, and describes where to find documentation and sample codes for getting started with the kits and sensors.
This document provides a tutorial introduction to GPGPU computation using NVIDIA CUDA. It begins with a brief overview and warnings about the large numbers involved in GPGPU. The agenda then outlines topics to be covered including general purpose GPU computing using CUDA and optimization topics like memory bandwidth optimization. Key aspects of CUDA programming are introduced like the CUDA memory model, compute capabilities of GPUs, and profiling tools. Examples are provided of simple CUDA kernels and how to configure kernel launches for grids and blocks of threads. Optimization techniques like choosing block/grid sizes to maximize occupancy are also discussed.
The document discusses deep learning applications design, development and deployment in IoT edge. It describes using a Power9 system to train artificial neural network models using the MNIST dataset. It also covers building inference engines for Android phones and deploying visual recognition models to IBM Watson Studio.
Similar to GPU programming and Its Case Study (20)
1. GPU Programming and SMAC
Case Study
Zhengjie Lu
Master Student of Electronic Group
Electrical Engineering
Technische Universiteit Eindhoven, NL
2. Contents
Part 1: GPU Programming
1.1 NVIDIA GPU Hardware
1.2 NVIDIA CUDA Programming
1.3 Programming Environment
Part 2: SMAC Case Study
2.1 SMAC Introduction
2.2 SMAC Mapping
2.3 Experiment & Analysis
Part 3: Conclusion & Future Development
3. Concepts
1. GPU
• Graphic processing unit or graphic card
• Chip vendor: NVIDIA and ATI
2. CUDA Programming
• “Compute Unified Device Architecture”
• Support by NVIDIA
3. SMAC Application
• “Simplified Method for Atmosphere Correction”
PAGE 36/28/15
4. Part 1: GPU Programming
/ name of department PAGE 46/28/15
5. 1.1 NVIDIA GPU Hardware
• What does NVIDA GPU look like?
/ name of department PAGE 56/28/15
6. 1.1 NVIDIA GPU Hardware
• Example: NVIDIA 8-Series GPU
− 128 stream processors (SPs): 1.35GHz per processor
− 16 shared memories: shared by every 8 SPs, small
but fast.
− 1 global memory: shared by 128 SPs, slow but large.
− 1 constant memory: shared by 128 SPs, small but
fast.
/ name of department PAGE 66/28/15
7. 1.1 NVIDIA GPU Hardware
/ name of department PAGE 76/28/15
Stream Processor (SP) Shared Memory
Global/Constant Memory
Stream Multi-Processor (SM)
8. 1.1 NVIDIA GPU Hardware
• Connection between GPU and CPU
− CPU => main memory => GPU global memory => GPU
− GPU => GPU global memory => main memory => CPU
/ name of department PAGE 86/28/15
9. 1.1 NVIDIA GPU Hardware
• Hardware summary:
− Multi-threading is supported physically with the SPs.
− SPs inside a SM communicate with each other
through the shared memory.
− SMs communicate with each other through the global
memory.
− GPU and CPU communicate with each other through
their memories: global memory main memory
/ name of department PAGE 96/28/15
10. 1.2 NVIDIA CUDA Programming
• CUDA programming concepts
− Thread: the basic unit
− Block: the collection of threads
− Grid: the collection of blocks
/ name of department PAGE 106/28/15
11. 1.2 NVIDIA CUDA Programming
• CUDA programming concepts
− A grid is mapped on GPU by the scheduler
− A block is mapped on SM by the scheduler
− A thread is mapped on SP by the scheduler
/ name of department PAGE 116/28/15
12. 1.2 NVIDIA CUDA Programming
/ name of department PAGE 126/28/15
13. 1.2 NVIDIA CUDA Programming
• CPU programming custom:
i. Allocate the CPU memory
ii. Run the CPU kernel
• CUDA programming custom:
i. Allocate the GPU memory
ii. Copy the input to the GPU memory
iii. Run the GPU kernel
iv. Copy the output from the GPU memory
/ name of department PAGE 136/28/15
14. 1.2 NVIDIA CUDA Programming
• Example:
/*******************************************************/
/* File: main.c
/* Description: 8x8 matrix addition on CPU
/*******************************************************/
//Data definition
const int mat1[64] = {…};
const int mat2[64] = {…};
const int mat3[64];
//Matrix addition on CPU
void matrixAdd_CPU(int index, int* IN1, int* IN2, int* OUT);
// Main body
int main()
{
// Run the matrix addition on CPU
matrixAdd_CPU(64, mat1, mat2, mat3);
return 0;
}
/****************************************************/
/* File: main.cu
/* Description: 8x8 matrix addition on GPU
/****************************************************/
//Data definition
const int mat1[64] = {…};
const int mat2[64] = {…};
const int mat3[64];
//Matrix addition on GPU
void matrixAdd_GPU(int index, int* IN1, int* IN2, int* OUT);
// Main body
int main()
{
// Run the matrix addition on CPU
matrixAdd_GPU(row, col, mat1, mat2, mat3);
return 0;
}
16. 1.2 NVIDIA CUDA Programming
• CUDA programming optimization
1) Use the registers and shared memories
2) Maximize the number of threads per block
3) Global Memory access coalescence
4) Shared memory bank conflict
5) Group the byte access
6) Stream execution
/ name of department PAGE 166/28/15
17. 1.2 NVIDIA CUDA Programming
1) Use the registers and shared memories
− Register is the fastest. (8192 32-bit reg. per SM)
− Shared memory is fast, but small. (16KB per SM)
− Global memory is slow, but large. (at least 256MB )
/ name of department PAGE 176/28/15
18. 1.2 NVIDIA CUDA Programming
2) Maximize the number of threads per block
i. Determine the register/memory budget per thread,
with the tool cudaProf.
ii. Determine the maximum number of threads, with
the tool cudaCal.
iii. Determine the number of blocks
/ name of department PAGE 186/28/15
19. 1.2 NVIDIA CUDA Programming
3) Global Memory access coalescence
− Global memory access pattern: 16 threads per time
− 16 threads must access the global memory with 16
continuous words
− 1st
thread must access the global memory address
which is 16-word aligned
/ name of department PAGE 196/28/15
20. 1.2 NVIDIA CUDA Programming
/ name of department
PAGE 206/28/15
Coalescence Non-coalescence Non-coalescence
Non-coalescence
21. 1.2 NVIDIA CUDA Programming
4) Shared memory bank conflict
− Shared memory access pattern: 16 thread
− 16KB shared memory: 16 x 1KB memory bank
− The threads shouldn’t access two different addresses
inside a memory bank
/ name of department PAGE 216/28/15
22. 1.2 NVIDIA CUDA Programming
/ name of department PAGE 226/28/15
No bank confliction Bank confliction Bank confliction
23. 1.2 NVIDIA CUDA Programming
5) Group the byte access
/ name of department PAGE 236/28/15
No group access
Group access
24. 1.2 NVIDIA CUDA Programming
6) Stream execution
/ name of department PAGE 246/28/15
25. 1.2 NVIDIA CUDA Programming
• Tips
− Examples: NVIDIA SDK
− Programming: “NVIDIA CUDA Programming Guide”
− Optimization: “NVIDIA CUDA C Programming: Best
Practices Guide”
/ name of department PAGE 256/28/15
26. 1.3 Programming Environment
1. Preparation
− Windows: Microsoft Visual C++ 2008 Express
− Linux
/ name of department PAGE 266/28/15
27. 1.3 Programming Environment
2. CUDA installment
− Step 1: Download the CUDA package suitable for your
operation system
(http://www.nvidia.com/object/cuda_get.html)
− Step 2: Install CUDA Driver
− Step 3: Install CUDA Toolkit
− Step 4: Install CUDA SDK
− Step 5: Verify the installment with running the SDK
examples
/ name of department PAGE 276/28/15
28. 1.3 Programming Environment
3. CUDA project setup (Windows)
− Step 1: Download “CUDA Wizard” and install it
(http://www.comp.hkbu.edu.hk/~kyzhao/)
− Step2 : Open VC++ 2008 express
− Step 3: Click “File” and choose “New/Project”
− Step 4: Choose “CUDA” in “Project types” and then
select “CUDAWinApp” in “Visual Studio installed
templates”
− Step 5: Name the CUDA project and click “OK”.
− Step 6: Click “Solution Explorer”
/ name of department PAGE 286/28/15
29. 1.3 Programming Environment
3. CUDA project setup (Windows)
− Step 7: right click “Source Files” and choose
“Add/New Item…”
− Step 8: Click “Code” in “Categories” and then choose
“C++. File (.cpp)” in “Visual Studio installed
templates”
− Step 9: Name the file as “main.cu” and click “Add”
− Step 10: Repeat Step 6~8, and make the other file
named “GPU_kernel.cu”
− Step 11: Click “Solution Explorer” and select
“GPU_kernel.cu” under the menu “Source Files”
/ name of department PAGE 296/28/15
30. 1.3 Programming Environment
3. CUDA project setup (Windows)
− Step 12: Right click “GPU_kernel.cu”
− Step 13: Click “Configuration Properties” and then
click “General”
− Step 14: Select “Custom Build Tool” in “Tool” and
click OK
− Step 15: Implement your GPU kernel in
“GPU_kernel.cu” and the others in “main.cu”
/ name of department PAGE 306/28/15
31. 1.3 Programming Environment
• Tips
− Set up a CUDA project on Linux:
http://sites.google.com/site/5kk70gpu/installation
http://forums.nvidia.com/lofiversion/index.php?f62.htm
l
/ name of department PAGE 316/28/15
32. Part 2: SMAC Case Study
/ name of department PAGE 326/28/15
33. 2.1 SMAC Introduction
• SMAC algorithm
• SMAC as “Simplified Method for Atmospheric
Correction”
• A fast computation on the atmosphere reflections
34. 2.1 SMAC Introduction
• SMAC application profile
Data size: 5781 x 10 X 4 Bytes = 231240 Bytes
38. 2.2 SMAC Mapping
• SMAC kernel on GPU
− Data size: 64 x 5781 x 4 Bytes
− CPU time:
− GPU Time:
− Demo
39. 2.3 Experiment & Analysis
• Experiment Preparation
HARDWARE
CPU Intel Duo-Core, 2.5GHz per core.
GPU nVidia 32-Core GPU, 0.95GHz per
core.
Main Memory 4GB
PCI-E PCI express 1.0 x 16
Operation system Widows Vista Enterprise
CUDA version CUDA 1.1
SOFTWARE
GPU maximum registers per
thread
60
GPU thread number 192 x 4 (#thread per block x #block)
CPU thread number 1
40. 2.3 Experiment & Analysis
• Experiment setup
− Performance
− GPU improvement
CPU time
GPU Improvement
GPU time
CPU time CPU stop timer CPU start timer -
GPU time GPU stop timer GPU start timer -
41. 2.3 Experiment & Analysis
• Experiment setup
− Linear execution-time prediction
CPU time CPU overhead Bytes CPU speed
( )
( )
( )
( )
GPU time GPU memorytime GPU run time
GPUmemory overhead Bytes GPU memoryspeed
GPUkernel overhead Bytes GPU kernel speed
GPUmemory overhead GPUkernel overhead
Bytes GPU memoryspeed GPU kernel speed
GPU overhe
ad Bytes GPU speed
CPU overhead Bytes CPU speed
Improvement
GPU overhead Bytes GPU speed
Bytes CPU speed CPU speed
Bytes GPU speed GPU speed
Only holds for
large-size data !!!
42. 2.3 Experiment & Analysis
• Experiment setup
− Linear execution-time prediction
5
5.39 10CPU time data size
6
1.67 2.41 10GPU time data size
6
4.45 2.01 10GPU time data size
1-stream:
8-stream:
1-thread:
44. 2.3 Experiment & Analysis
• Experiment result
− Linear execution-time model
45. 2.3 Experiment & Analysis
• Experiment result
− Linear execution-time model
46. 2.3 Experiment & Analysis
• Roofline model
/ name of department PAGE 466/28/15
Log Scaling
LogScaling
47. 2.3 Experiment & Analysis
• Roofline model with SMAC
kernel
Hardware: NVIDIA Quadro FX570M
PCI express bandwidth (GB/sec): 4
Peak performance (GFlops/sec): 91.2
Peak performance without FMAU
(Gflops/sec):
30.4
Software: SMAC kernel on GPU
Data size (Bytes): 5971968
0
Issued instruction number
(Flops):
4189335552
Execution time (ms): 79.2
Instruction density (Flops/Byte): 70.15
Instruction Throughput
(GFlops/sec):
52.8
48. 2.3 Experiment & Analysis
0.25 2.5 25 250
1
10
100
GFlops/sec
w/out FMA
Peak Performance
70.15
Roofline Model of SMAC on
GPU
52.8 GFlops/sec
Flops/Byte
Hard
disk
IO
BW
49. 3. Conclusion & Future Development
• SMAC application:
− The bottleneck is the hard disk IO.
• SMAC kernel on GPU:
− The bottleneck is the computation.
− 25 times faster than CPU, when large-size data is
processed with the streams.
− The performance ceiling would occur when the data
size is “infinitely” huge.
/ name of department PAGE 496/28/15
50. 3. Conclusion & Future Development
• Future development
− Power measurement: “Consumption of Contemporary
Graphics Accelerators”
/ name of department PAGE 506/28/15
51. 3. Conclusion & Future Development
• Power measurement: physical setup
/ name of department PAGE 516/28/15
8 x 0.12 omg (5W)
52. 3. Conclusion & Future Development
• Future development
− Improve the hard disk I/O
− Employ more powerful GPU
/ name of department PAGE 526/28/15
0 1 2 3 4 5 6 7 8 9
0
20
40
60
80
100
120
520937472
260468736
130234368
65117184
32558592
16279296
8139648
4069824
2034912
1017456
508728
254364