There are at least 40 to 50 different formats of GC logs. Here, we explained the commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Are you building high throughput, low latency application? Are you trying to figure out perfect JVM heap size? Are you struggling to choose right garbage collection algorithm and settings? Are you striving to achieve pause less GC? Do you know the right tools & best practices to tame the GC? Do you know to troubleshoot memory problems using GC logs? You will get complete answers to several such questions in this presentation.
There are at least 40 to 50 different formats of GC logs. Here, we explained the commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Are you building high throughput, low latency application? Are you trying to figure out perfect JVM heap size? Are you struggling to choose right garbage collection algorithm and settings? Are you striving to achieve pause less GC? Do you know the right tools & best practices to tame the GC? Do you know to troubleshoot memory problems using GC logs? You will get complete answers to several such questions in this presentation.
Kernel Recipes 2015: Introduction to Kernel Power ManagementAnne Nicolas
In order to keep up with the complexities of SoCs, the Linux kernel has an ever-growing set of features for power management. For the uninitiated, it can be confusing how each of these features work and even more confusing how they should work together. This talk will be a high-level introduction and overview of each of the various features, as well as discuss how all they fit together and interact.
Some of the features/subsystems covered: suspend/resume, CPUidle, CPUfreq, clocks, regulators, runtime PM, generic power domains, PM QoS.
Kevin Hilman, Linaro
https://kernel-recipes.org/en/2015/introduction-to-kernel-power-management/
De gemiddelde GPU bevat tegenwoordig meer PK's dan de CPU. Naar aanleiding hiervan komen er steeds meer mogelijkheden om computationele problemen te verplaatsen van de CPU naar de GPU. Deze presentatie zal een inleiding zijn hoe je dit in Java kunt doen met behulp van Jogamp JoCL. Aan de hand van enkele simpele problemen wordt aangetoond wanneer een GPU beter ingezet kan worden dan een CPU en vice versa. Dit is ook een van de speerpunten in Java 9 (Project Sumatra) wat o.a. JoCL als inspiratie gebruikt.
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 11:00
Тезисы:
http://www.highload.ru/2017/abstracts/2884.html
Java на Linux встречается повсеместно в информационных системах от больших данных до новомодных serverless архитектур. Как Linux, так и Java имеют свои эксплуатационные нюансы. Понимание этих нюансов важно, чтобы заставить стек Java + Linux работать стабильно и эффективно.
Но на практике "джависты" очень любят мыслить кроссплатформенно и не хотят разбираться с особенностями операционной системы, a "линускоиды" считают JVM чуждым миру Linux процессом, пожирающим всю доступную на сервере память.
А потом появляется Docker, и нюансов становится ещё больше...
Цель доклада - рассказать "джавистам" про Linux и Docker, а "линуксоидам" про JVM.
Optimizing Parallel Reduction in CUDA : NOTESSubhajit Sahu
Highlighted notes on Optimizing Parallel Reduction in CUDA
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Interesting optimizations, i should try these soon as PageRank is basically lots of sums.
Video Marketing Mastery: YouTube and Google HangoutsLou Bortone
This is part of video marketing pro Lou Bortone's "Total Video Solution" course. This presentation focuses on YouTube and Google Hangouts as marketing tools.
Enterprise zones do they create or transfer valueSimon Wainwright
This recently published article considers whether any lessons have been learnt from the Enterprise Zones of the 1980s and whether providing economic stimuli creates, distorts or simply transfers value.
Kernel Recipes 2015: Introduction to Kernel Power ManagementAnne Nicolas
In order to keep up with the complexities of SoCs, the Linux kernel has an ever-growing set of features for power management. For the uninitiated, it can be confusing how each of these features work and even more confusing how they should work together. This talk will be a high-level introduction and overview of each of the various features, as well as discuss how all they fit together and interact.
Some of the features/subsystems covered: suspend/resume, CPUidle, CPUfreq, clocks, regulators, runtime PM, generic power domains, PM QoS.
Kevin Hilman, Linaro
https://kernel-recipes.org/en/2015/introduction-to-kernel-power-management/
De gemiddelde GPU bevat tegenwoordig meer PK's dan de CPU. Naar aanleiding hiervan komen er steeds meer mogelijkheden om computationele problemen te verplaatsen van de CPU naar de GPU. Deze presentatie zal een inleiding zijn hoe je dit in Java kunt doen met behulp van Jogamp JoCL. Aan de hand van enkele simpele problemen wordt aangetoond wanneer een GPU beter ingezet kan worden dan een CPU en vice versa. Dit is ook een van de speerpunten in Java 9 (Project Sumatra) wat o.a. JoCL als inspiratie gebruikt.
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 11:00
Тезисы:
http://www.highload.ru/2017/abstracts/2884.html
Java на Linux встречается повсеместно в информационных системах от больших данных до новомодных serverless архитектур. Как Linux, так и Java имеют свои эксплуатационные нюансы. Понимание этих нюансов важно, чтобы заставить стек Java + Linux работать стабильно и эффективно.
Но на практике "джависты" очень любят мыслить кроссплатформенно и не хотят разбираться с особенностями операционной системы, a "линускоиды" считают JVM чуждым миру Linux процессом, пожирающим всю доступную на сервере память.
А потом появляется Docker, и нюансов становится ещё больше...
Цель доклада - рассказать "джавистам" про Linux и Docker, а "линуксоидам" про JVM.
Optimizing Parallel Reduction in CUDA : NOTESSubhajit Sahu
Highlighted notes on Optimizing Parallel Reduction in CUDA
While doing research work under Prof. Dip Banerjee, Prof. Kishore Kothapalli.
Interesting optimizations, i should try these soon as PageRank is basically lots of sums.
Video Marketing Mastery: YouTube and Google HangoutsLou Bortone
This is part of video marketing pro Lou Bortone's "Total Video Solution" course. This presentation focuses on YouTube and Google Hangouts as marketing tools.
Enterprise zones do they create or transfer valueSimon Wainwright
This recently published article considers whether any lessons have been learnt from the Enterprise Zones of the 1980s and whether providing economic stimuli creates, distorts or simply transfers value.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
Modern graphics processing units (GPUs) are efficient general-purpose stream processors. Learn how Java can exploit the power of GPUs to optimize high-performance enterprise and technical computing applications such as big data and analytics workloads. This presentation covers principles and considerations for GPU programming from Java and looks at the software stack and developer tools available. It also presents a demo showing GPU acceleration and discusses what is coming in the future.
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDocker, Inc.
Using the NVIDIA Container Runtime, many developers and enterprises have been developing, benchmarking and deploying deep learning (DL) frameworks, HPC and other GPU accelerated containers at scale for the last two years. In this talk, we will go over the architecture of the NVIDIA Container Runtime and discuss our recent close collaboration with Docker. The result of our collaboration with Docker is a seamless native integration of the runtime enabling Docker Engine 19.03 CE and the forthcoming Docker Enterprise release to run GPU accelerated containers. We will also highlight containerized NVIDIA drivers. This new feature eliminates the overhead of provisioning GPU machines and brings GPU support on container optimized operating systems, which either lack package managers for installing software or require all applications to run in containers. In this session, you will learn how GPU accelerated containers can be easily built and deployed through the use of driver containers and native support for GPUs in Docker 19.03. The session will include a demo of running a GPU accelerated deep learning container using the new CLI options in Docker 19.03 and containerized drivers. Running NVIDIA GPU accelerated containers with Docker has never been this easy!
Accelerating HPC Applications on NVIDIA GPUs with OpenACCinside-BigData.com
In this deck from the Stanford HPC Conference, Doug Miles from NVIDIA presents: Accelerating HPC Applications on NVIDIA GPUs with OpenACC."
"OpenACC is a directive-based parallel programming model for GPU accelerated and heterogeneous parallel HPC systems. It offers higher programmer productivity compared to use of explicit models like CUDA and OpenCL.
Application source code instrumented with OpenACC directives remains portable to any system with a standard Fortran/C/C++ compiler, and can be efficiently parallelized for various types of HPC systems – multicore CPUs, heterogeneous CPU+GPU, and manycore processors.
This talk will include an introduction to the OpenACC programming model, provide examples of its use in a number of production applications, explain how OpenACC and CUDA Unified Memory working together can dramatically simplify GPU programming, and close with a few thoughts on OpenACC future directions."
Watch the video: https://youtu.be/CaE3n89QM8o
Learn more: https://www.openacc.org/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
Third Workshop on Accelerator Programming Using Directives (WACCPD2016, co-located with SC16)
While GPUs are increasingly popular for high-performance
computing, optimizing the performance of GPU programs is a time-consuming and non-trivial process in general. This complexity stems from the low abstraction level of standard
GPU programming models such as CUDA and OpenCL:
programmers are required to orchestrate low-level operations
in order to exploit the full capability of GPUs. In terms of
software productivity and portability, a more attractive approach
would be to facilitate GPU programming by providing high-level
abstractions for expressing parallel algorithms.
OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years.
From OpenMP 4.0 onwards, GPU platforms are supported
by extending OpenMP’s high-level parallel abstractions with
accelerator programming. This extension allows programmers to
write GPU programs in standard C/C++ or Fortran languages,
without exposing too many details of GPU architectures.
However, such high-level parallel programming strategies generally impose additional program optimizations on compilers,
which could result in lower performance than fully hand-tuned
code with low-level programming models.To study potential
performance improvements by compiling and optimizing high-level GPU programs, in this paper, we 1) evaluate a set of
OpenMP 4.x benchmarks on an IBM POWER8 and NVIDIA
Tesla GPU platform and 2) conduct a comparable performance
analysis among hand-written CUDA and automatically-generated
GPU programs by the IBM XL and clang/LLVM compilers.
Similar to Nvidia® cuda™ 5 sample evaluationresult_2 (20)
Microsoft Certified: Azure AI Fundamentals (AI-900) 試験シラバスとして提供されているのが英語しか無かったので日本語版を作成しました。MSLearn だけでは不合格になると思うので、知識定着目的のチェックシートも併せてスライドに盛り込み。なお、公式サイトでの文言等が更新されている場合は公式サイトのものを参照してください。
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
6. conjugateGradientPrecond.exe
conjugateGradientPrecond starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
GPU selected Device ID = 0
> GPU device has 8 Multi-Processors, SM 2.1 compute capabilities
laplace dimension = 128
Convergence of conjugate gradient without preconditioning:
iteration = 542, residual = 8.660636e-013
Convergence Test: OK
Convergence of conjugate gradient using incomplete LU preconditioning:
iteration = 188, residual = 9.056491e-013
Convergence Test: OK
Test Summary:
Counted total of 0 errors
qaerr1 = 0.000004 qaerr2 = 0.000003
7. convolutionFFT2D.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionFFT2D.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Testing built-in R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating R2C & C2R FFT plans for 2048 x 2048
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1267.922657 MPix/s (3.154767 ms)
...reading back GPU convolution results
...running reference CPU convolution
...comparing the results: rel L2 = 7.179421E-008 (max delta = 4.808732E-007)
L2norm Error OK
...shutting down
Testing custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1261.058719 MPix/s (3.171938 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.505000E-008 (max delta = 4.873593E-007)
L2norm Error OK
...shutting down
8. convolutionFFT2D.exe 2/2
Testing updated custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1588.813202 MPix/s (2.517602 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.470519E-008 (max delta = 5.276085E-007)
L2norm Error OK
...shutting down
Test Summary: 0 errors
Test passed
9. convolutionSeparable.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionSeparable.exe] -
Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Width x Height = 3072 x 3072
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...
convolutionSeparable, Throughput = 3179.0263 MPixels/sec, Time = 0.00297 s, Size = 9437184 Pixels,
NumDevsUsed = 1, Work
group = 0
Reading back GPU results...
Checking the results...
...running convolutionRowCPU()
...running convolutionColumnCPU()
...comparing the results
...Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
10. convolutionTexture.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionTexture.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Initializing data...
Running GPU rows convolution (10 identical iterations)...
Average convolutionRowsGPU() time: 1.427774 msecs; //3304.859282 Mpix/s
Copying convolutionRowGPU() output back to the texture...
cudaMemcpyToArray() time: 0.481161 msecs; //9806.674660 Mpix/s
Running GPU columns convolution (10 iterations)
Average convolutionColumnsGPU() time: 1.429637 msecs; //3300.552071 Mpix/s
Reading back GPU results...
Checking the results...
...running convolutionRowsCPU()
...running convolutionColumnsCPU()
Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
24. deviceQuery.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
25. deviceQuery.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1,
Device0 = GeForce
GTX 560 Ti
26. deviceQueryDrv.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA
Samples¥v5.0¥bin¥win64¥Release¥deviceQueryDrv.exe Starting...
CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version: 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535)
3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
27. deviceQueryDrv.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
28. dwtHaar1D.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dwtHaar1D.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
source file = "../../../3_Imaging/dwtHaar1D/data/signal.dat"
reference file = "result.dat"
gold file = "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Reading signal from "../../../3_Imaging/dwtHaar1D/data/signal.dat"
Writing result to "result.dat"
Reading reference result from "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Test success!
Signal.dat
9.5012929e-001
2.3113851e-001
6.0684258e-001
4.8598247e-001
8.9129897e-001
・
・
・
Regression.gold.dat
Result.dat
29. dxtc.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dxtc.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Loaded '../../../3_Imaging/dxtc/data/lena_std.ppm', 512 x 512 pixels
Running DXT Compression on 512 x 512 image...
16384 Blocks, 64 Threads per Block, 1048576 Threads in Grid...
dxtc, Throughput = 17.7004 MPixels/s, Time = 0.01481 s, Size = 262144 Pixels, NumDevsUsed = 1, Workgroup =
64
30. dxtc.exe 1/4
Checking accuracy...
Deviation at ( 9, 1): 0.791667 rms
Deviation at ( 99, 1): 1.041667 rms
Deviation at ( 12, 2): 0.937500 rms
Deviation at ( 90, 3): 0.166667 rms
Deviation at ( 38, 4): 1.916667 rms
Deviation at ( 34, 7): 1.687500 rms
Deviation at ( 57, 7): 0.458333 rms
Deviation at ( 100, 8): 2.416667 rms
Deviation at ( 30, 9): 2.375000 rms
Deviation at ( 31, 9): 0.770833 rms
Deviation at ( 58, 9): 0.791667 rms
Deviation at ( 29, 10): 0.020833 rms
Deviation at ( 79, 10): 1.833333 rms
Deviation at ( 13, 11): 1.041667 rms
Deviation at ( 4, 13): 8.562500 rms
Deviation at ( 28, 13): 0.562500 rms
Deviation at ( 90, 13): 0.708333 rms
Deviation at ( 25, 14): 0.520833 rms
Deviation at ( 69, 14): 0.770833 rms
Deviation at ( 87, 16): 0.708333 rms
Deviation at ( 90, 17): 1.041667 rms
Deviation at ( 24, 19): 0.916667 rms
Deviation at ( 25, 19): 0.625000 rms
Deviation at ( 26, 19): 1.041667 rms
Deviation at ( 55, 20): 4.791667 rms
Deviation at ( 20, 23): 1.541667 rms
Deviation at ( 99, 23): 3.312500 rms
Deviation at ( 45, 24): 18.104166 rms
Deviation at ( 8, 28): 0.895833 rms
31. dxtc.exe 2/4
Deviation at ( 21, 30): 1.562500 rms
Deviation at ( 115, 32): 24.104166 rms
Deviation at ( 2, 33): 0.854167 rms
Deviation at ( 102, 33): 2.250000 rms
Deviation at ( 50, 35): 26.958334 rms
Deviation at ( 68, 35): 11.937500 rms
Deviation at ( 115, 36): 0.458333 rms
Deviation at ( 12, 38): 2.166667 rms
Deviation at ( 40, 40): 0.270833 rms
Deviation at ( 86, 43): 0.604167 rms
Deviation at ( 116, 43): 0.125000 rms
Deviation at ( 43, 44): 2.250000 rms
Deviation at ( 54, 44): 4.791667 rms
Deviation at ( 46, 46): 2.875000 rms
Deviation at ( 116, 46): 0.604167 rms
Deviation at ( 4, 47): 0.708333 rms
Deviation at ( 117, 48): 0.937500 rms
Deviation at ( 23, 51): 3.520833 rms
Deviation at ( 11, 52): 0.041667 rms
Deviation at ( 67, 54): 5.687500 rms
Deviation at ( 26, 55): 0.854167 rms
Deviation at ( 21, 56): 5.000000 rms
Deviation at ( 24, 56): 0.562500 rms
Deviation at ( 30, 57): 0.937500 rms
Deviation at ( 21, 59): 2.541667 rms
Deviation at ( 120, 59): 0.104167 rms
Deviation at ( 112, 60): 1.125000 rms
Deviation at ( 77, 61): 1.083333 rms
32. dxtc.exe 3/4
Deviation at ( 114, 62): 4.958333 rms
Deviation at ( 78, 66): 0.541667 rms
Deviation at ( 106, 68): 0.375000 rms
Deviation at ( 16, 70): 3.104167 rms
Deviation at ( 10, 71): 0.937500 rms
Deviation at ( 108, 71): 0.354167 rms
Deviation at ( 0, 72): 0.854167 rms
Deviation at ( 118, 72): 5.562500 rms
Deviation at ( 11, 73): 0.541667 rms
Deviation at ( 68, 74): 1.937500 rms
Deviation at ( 70, 76): 1.791667 rms
Deviation at ( 124, 76): 3.354167 rms
Deviation at ( 103, 78): 0.375000 rms
Deviation at ( 127, 78): 0.541667 rms
Deviation at ( 108, 79): 0.083333 rms
Deviation at ( 120, 81): 0.541667 rms
Deviation at ( 43, 82): 24.979166 rms
Deviation at ( 67, 82): 3.125000 rms
Deviation at ( 78, 82): 2.437500 rms
Deviation at ( 123, 84): 0.541667 rms
Deviation at ( 127, 85): 0.187500 rms
Deviation at ( 122, 87): 0.083333 rms
Deviation at ( 124, 87): 0.541667 rms
Deviation at ( 127, 88): 0.229167 rms
Deviation at ( 93, 91): 0.666667 rms
Deviation at ( 115, 93): 0.083333 rms
Deviation at ( 69, 95): 1.875000 rms
Deviation at ( 106, 95): 1.125000 rms
33. dxtc.exe 4/4
Deviation at ( 107, 95): 3.708333 rms
Deviation at ( 13, 96): 1.354167 rms
Deviation at ( 115, 98): 0.187500 rms
Deviation at ( 118, 98): 0.187500 rms
Deviation at ( 116, 101): 0.187500 rms
Deviation at ( 78, 105): 0.541667 rms
Deviation at ( 67, 107): 0.708333 rms
Deviation at ( 74, 107): 0.375000 rms
Deviation at ( 65, 109): 0.770833 rms
Deviation at ( 89, 109): 0.708333 rms
Deviation at ( 118, 109): 3.854167 rms
Deviation at ( 67, 110): 1.083333 rms
Deviation at ( 88, 111): 0.208333 rms
Deviation at ( 64, 113): 0.708333 rms
Deviation at ( 84, 113): 0.333333 rms
Deviation at ( 88, 113): 0.187500 rms
Deviation at ( 84, 114): 1.666667 rms
Deviation at ( 66, 115): 0.770833 rms
Deviation at ( 19, 118): 5.270833 rms
Deviation at ( 76, 121): 0.104167 rms
Deviation at ( 70, 122): 0.708333 rms
Deviation at ( 91, 122): 0.208333 rms
Deviation at ( 71, 123): 0.854167 rms
Deviation at ( 75, 123): 0.854167 rms
Deviation at ( 61, 124): 0.937500 rms
Deviation at ( 91, 124): 0.270833 rms
RMS(reference, result) = 0.015488
Test passed
34. Summary
GTX560, Some samples does not work fine.
→ MUST support CUDA compute capability 3.0.
→ Requires GPU devices with compute SM 3.5 or
higher.
This evaluation to be continued, For future
reference.