This document summarizes the results of running CUDA 5.0 samples on a system with a GTX 560 Ti GPU. Many samples ran successfully but some failed because the GTX 560 Ti only supports up to compute capability 2.1, while some samples require capability 3.0 or higher. The samples that failed included those using CUDA Dynamic Parallelism or requiring capability 3.5 or higher. Overall the evaluation provided performance data for many samples and identified requirements for samples that did not run.
There are at least 40 to 50 different formats of GC logs. Here, we explained the commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Are you building high throughput, low latency application? Are you trying to figure out perfect JVM heap size? Are you struggling to choose right garbage collection algorithm and settings? Are you striving to achieve pause less GC? Do you know the right tools & best practices to tame the GC? Do you know to troubleshoot memory problems using GC logs? You will get complete answers to several such questions in this presentation.
Kernel Recipes 2015: Introduction to Kernel Power ManagementAnne Nicolas
In order to keep up with the complexities of SoCs, the Linux kernel has an ever-growing set of features for power management. For the uninitiated, it can be confusing how each of these features work and even more confusing how they should work together. This talk will be a high-level introduction and overview of each of the various features, as well as discuss how all they fit together and interact.
Some of the features/subsystems covered: suspend/resume, CPUidle, CPUfreq, clocks, regulators, runtime PM, generic power domains, PM QoS.
Kevin Hilman, Linaro
https://kernel-recipes.org/en/2015/introduction-to-kernel-power-management/
De gemiddelde GPU bevat tegenwoordig meer PK's dan de CPU. Naar aanleiding hiervan komen er steeds meer mogelijkheden om computationele problemen te verplaatsen van de CPU naar de GPU. Deze presentatie zal een inleiding zijn hoe je dit in Java kunt doen met behulp van Jogamp JoCL. Aan de hand van enkele simpele problemen wordt aangetoond wanneer een GPU beter ingezet kan worden dan een CPU en vice versa. Dit is ook een van de speerpunten in Java 9 (Project Sumatra) wat o.a. JoCL als inspiratie gebruikt.
There are at least 40 to 50 different formats of GC logs. Here, we explained the commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Are you building high throughput, low latency application? Are you trying to figure out perfect JVM heap size? Are you struggling to choose right garbage collection algorithm and settings? Are you striving to achieve pause less GC? Do you know the right tools & best practices to tame the GC? Do you know to troubleshoot memory problems using GC logs? You will get complete answers to several such questions in this presentation.
Kernel Recipes 2015: Introduction to Kernel Power ManagementAnne Nicolas
In order to keep up with the complexities of SoCs, the Linux kernel has an ever-growing set of features for power management. For the uninitiated, it can be confusing how each of these features work and even more confusing how they should work together. This talk will be a high-level introduction and overview of each of the various features, as well as discuss how all they fit together and interact.
Some of the features/subsystems covered: suspend/resume, CPUidle, CPUfreq, clocks, regulators, runtime PM, generic power domains, PM QoS.
Kevin Hilman, Linaro
https://kernel-recipes.org/en/2015/introduction-to-kernel-power-management/
De gemiddelde GPU bevat tegenwoordig meer PK's dan de CPU. Naar aanleiding hiervan komen er steeds meer mogelijkheden om computationele problemen te verplaatsen van de CPU naar de GPU. Deze presentatie zal een inleiding zijn hoe je dit in Java kunt doen met behulp van Jogamp JoCL. Aan de hand van enkele simpele problemen wordt aangetoond wanneer een GPU beter ingezet kan worden dan een CPU en vice versa. Dit is ook een van de speerpunten in Java 9 (Project Sumatra) wat o.a. JoCL als inspiratie gebruikt.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
CUDA lab's slides of "parallel programming" courseShuai Yuan
online version:
http://yszheda.github.io/CUDA-lab
I made the slides as a part-time TA for the lab course.
The slides are generated by the great reveal.js.
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 11:00
Тезисы:
http://www.highload.ru/2017/abstracts/2884.html
Java на Linux встречается повсеместно в информационных системах от больших данных до новомодных serverless архитектур. Как Linux, так и Java имеют свои эксплуатационные нюансы. Понимание этих нюансов важно, чтобы заставить стек Java + Linux работать стабильно и эффективно.
Но на практике "джависты" очень любят мыслить кроссплатформенно и не хотят разбираться с особенностями операционной системы, a "линускоиды" считают JVM чуждым миру Linux процессом, пожирающим всю доступную на сервере память.
А потом появляется Docker, и нюансов становится ещё больше...
Цель доклада - рассказать "джавистам" про Linux и Docker, а "линуксоидам" про JVM.
When your whole system is unresponsive, how to investigate on this failure ?
We'll see how to get a memory dump for offline analysis with kdump system.
Then how to analyze it with crash utility.
And finally, how to use crash on a running system to modify the kernel memory (at your own risks !)
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
At a rate of almost 9 changes per hour (24/7), the Linux kernel is definitely a scary beast. Bugs are introduced on a daily basis and, through the use of multiple code analyzers, *some* of them are detected and fixed before they hit mainline. Over the course of the last few years, Gustavo has been fixing such bugs and many different issues in every corner of the Linux kernel. Recently, he was in charge of leading the efforts to globally enable -Wimplicit-fallthrough; which appears by default in Linux v5.3. This presentation is a report on all the stuff Gustavo has found and fixed in the kernel with the support of the Core Infrastructure Initiative.
Gustavo A.R. Silva
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsHeechul Yun
Presentation slides of the following paper at ECRTS'18.
Waqar Ali, Heechul Yun. "Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms." Euromicro Conference on Real-Time Systems (ECRTS), 2018
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
CUDA lab's slides of "parallel programming" courseShuai Yuan
online version:
http://yszheda.github.io/CUDA-lab
I made the slides as a part-time TA for the lab course.
The slides are generated by the great reveal.js.
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
Talk by Brendan Gregg at Kernel Recipes 2017 (Paris): "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 11:00
Тезисы:
http://www.highload.ru/2017/abstracts/2884.html
Java на Linux встречается повсеместно в информационных системах от больших данных до новомодных serverless архитектур. Как Linux, так и Java имеют свои эксплуатационные нюансы. Понимание этих нюансов важно, чтобы заставить стек Java + Linux работать стабильно и эффективно.
Но на практике "джависты" очень любят мыслить кроссплатформенно и не хотят разбираться с особенностями операционной системы, a "линускоиды" считают JVM чуждым миру Linux процессом, пожирающим всю доступную на сервере память.
А потом появляется Docker, и нюансов становится ещё больше...
Цель доклада - рассказать "джавистам" про Linux и Docker, а "линуксоидам" про JVM.
When your whole system is unresponsive, how to investigate on this failure ?
We'll see how to get a memory dump for offline analysis with kdump system.
Then how to analyze it with crash utility.
And finally, how to use crash on a running system to modify the kernel memory (at your own risks !)
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
At a rate of almost 9 changes per hour (24/7), the Linux kernel is definitely a scary beast. Bugs are introduced on a daily basis and, through the use of multiple code analyzers, *some* of them are detected and fixed before they hit mainline. Over the course of the last few years, Gustavo has been fixing such bugs and many different issues in every corner of the Linux kernel. Recently, he was in charge of leading the efforts to globally enable -Wimplicit-fallthrough; which appears by default in Linux v5.3. This presentation is a report on all the stuff Gustavo has found and fixed in the kernel with the support of the Core Infrastructure Initiative.
Gustavo A.R. Silva
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
Shape is a fundamental three dimensional molecular property and a powerful descriptor for molecular comparison and similarity assessment; similarity in shape has proven to be a very effective method for predicting similarity in biology. As such shape-based virtual screening has become an integral part of computational drug discovery, due to both its speed and efficacy. OpenEye’s recent port of their shape similarity application, ROCS, to the GPU has resulted in a virtual screening tool of unprecedented power – FastROCS. FastROCS’ speed allows it to perform large-scale calculations of a kind inaccessible in the past and has accelerated more routine shape searching to the point that it has become competitive with more traditional, but less effective, two dimensional methods. Go through the slides to learn more. Try GPUs for free here: www.Nvidia.com/GPUTestDrive
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsHeechul Yun
Presentation slides of the following paper at ECRTS'18.
Waqar Ali, Heechul Yun. "Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms." Euromicro Conference on Real-Time Systems (ECRTS), 2018
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
Все мы знаем, что наш любимый Pandas исключительно однопоточный, а модели из scikit-learn часто учатся не очень быстро даже в несколько процессов. Поэтому в докладе я расскажу о проекте RAPIDS - наборе библиотек для анализа данных и построения предиктивных моделей с использованием NVIDIA GPU. В докладе я предложу подискутировать о том, что закон Мура больше не выполняется, рассмотрю принципы работы архитектуры CUDA. Разберу библиотеки cuDF и cuML, а также постараюсь предельно честно рассказать о том, ждать ли чуда от перехода на GPU и в каких случаях чудо неизбежно.
Are you building high throughput, low latency application? Are you trying to figure out perfect JVM heap size? Are you struggling to choose right garbage collection algorithm and settings? Are you striving to achieve pause less GC? Do you know the right tools & best practices to tame the GC? Do you know to troubleshoot memory problems using GC logs? You will get complete answers to several such questions in this PPT.
Tier1app CEO & Founder, Ram Lakshmanan, spoke at All Day Devops 2017 about Java GC Logs. In this presentation, you can learn how to enable Java GC logs, commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
This talk introduces JTAG debugging capabilities, both for debugging hardware and software. Marek first explains what the JTAG stands for and explains the operation of the JTAG state machine. This is followed by an introduction to free software JTAG tools, OpenOCD and urJTAG. Marek shortly explains how to debug software using those tools and how that ties into the JTAG state machine. However, JTAG was designed for testing hardware. Marek explains what boundary scan testing (BST) is, what are BSDL files and their format, and practically demonstrates how to blink an LED using BST and only free software tools.
Marek Vasut
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
The Open Street Map project provides invaluable data that keeps driving users toward the PostGIS and PostgreSQL stacks. Loading today’s full Planet data set takes a 120GB XML file and unrolls it into over a terabyte of database data. Crunchy’s benchmark labs have followed the expansion of that Planet data over the last six database releases, as the re-ignition of the CPU wars combined with parallel execution features landing in the database. We’ll take a look at that data evolution, which server configurations worked, and which metrics techniques still matter in the all SSD era.
Microsoft Certified: Azure AI Fundamentals (AI-900) 試験シラバスとして提供されているのが英語しか無かったので日本語版を作成しました。MSLearn だけでは不合格になると思うので、知識定着目的のチェックシートも併せてスライドに盛り込み。なお、公式サイトでの文言等が更新されている場合は公式サイトのものを参照してください。
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
3. Sample target path and files
• C:¥ProgramData¥NVIDIA Corporation¥CUDA
Samples¥v5.0¥bin¥win64¥Release
4. alignedTypes.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥alignedTypes.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
[GeForce GTX 560 Ti] has 8 MP(s) x 48 (Cores/MP) = 384 (Cores)
> Compute scaling value = 1.00
> Memory Size = 49999872
Allocating memory...
Generating host input data array...
Uploading input data to GPU memory...
Testing misaligned types...
uint8...
Avg. time: 2.563287 ms / Copy throughput: 18.166525 GB/s.
TEST OK
uint16...
Avg. time: 1.429239 ms / Copy throughput: 32.580981 GB/s.
TEST OK
RGBA8_misaligned...
Avg. time: 1.766606 ms / Copy throughput: 26.359026 GB/s.
TEST OK
LA32_misaligned...
Avg. time: 0.998594 ms / Copy throughput: 46.631585 GB/s.
TEST OK
RGB32_misaligned...
Avg. time: 1.273794 ms / Copy throughput: 36.556941 GB/s.
TEST OK
RGBA32_misaligned...
Avg. time: 1.703606 ms / Copy throughput: 27.333794 GB/s.
TEST OK
5. alignedTypes.exe 2/2
Testing aligned types...
RGBA8...
Avg. time: 1.131558 ms / Copy throughput: 41.152104 GB/s.
TEST OK
I32...
Avg. time: 1.091073 ms / Copy throughput: 42.679095 GB/s.
TEST OK
LA32...
Avg. time: 0.952468 ms / Copy throughput: 48.889827 GB/s.
TEST OK
RGB32...
Avg. time: 1.431797 ms / Copy throughput: 32.522784 GB/s.
TEST OK
RGBA32...
Avg. time: 0.961305 ms / Copy throughput: 48.440401 GB/s.
TEST OK
RGBA32_2...
Avg. time: 1.340105 ms / Copy throughput: 34.748032 GB/s.
TEST OK
[alignedTypes] -> Test Results: 0 Failures
13. bilateralFilter.exe 1/2
Loading ../../../3_Imaging/bilateralFilter/data/nature_monte.bmp...
BMP width: 640
BMP height: 480
BMP file loaded successfully!
Loaded '../../../3_Imaging/bilateralFilter/data/nature_monte.bmp', 640 x 480 pixels
Found 1 CUDA Capable device(s) supporting CUDA
Device 0: "GeForce GTX 560 Ti"
CUDA Runtime Version : 5.0
CUDA Compute Capability : 2.1
Found CUDA Capable Device 0: "GeForce GTX 560 Ti"
Setting active device to 0
Using device 0: GeForce GTX 560 Ti
Running Standard Demonstration with GLUT loop...
Press '+' and '-' to change filter width
Press ']' and '[' to change number of iterations
Press 'e' and 'E' to change Euclidean delta
Press 'g' and 'G' to changle Gaussian delta
Press 'a' or 'A' to change Animation mode ON/OFF
16. binomialOptions.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥binomialOptions.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Using single precision...
Generating input data...
Running GPU binomial tree...
Options count : 512
Time steps : 2048
binomialOptionsGPU() time: 29.790300 msec
Options per second : 17186.802203
Running CPU binomial tree...
Comparing the results...
GPU binomial vs. Black-Scholes
L1 norm: 1.323721E-004
CPU binomial vs. Black-Scholes
L1 norm: 1.045245E-004
CPU binomial vs. GPU binomial
L1 norm: 3.391858E-005
Shutting down...
Test passed
17. BlackScholes.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥BlackScholes.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.
Executing Black-Scholes GPU kernel (512 iterations)...
Options count : 8000000
BlackScholesGPU() time : 0.806277 msec
Effective memory bandwidth: 99.221508 GB/s
Gigaoptions per second : 9.922151
BlackScholes, Throughput = 9.9222 GOptions/s, Time = 0.00081 s, Size = 8000000 options, NumDevsUsed = 1,
Workgroup = 128
18. BlackScholes.exe 2/2
Reading back GPU results...
Checking the results...
...running CPU calculations.
Comparing the results...
L1 norm: 1.768024E-007
Max absolute error: 1.120567E-005
Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.
[BlackScholes] - Test Summary
Test passed
19. boxFilter.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilter.exe Starting...
Loaded '../../../3_Imaging/boxFilter/data/lenaRGB.ppm', 1024 x 1024 pixels
Found 1 CUDA Capable device(s) supporting CUDA
Device 0: "GeForce GTX 560 Ti"
CUDA Runtime Version : 5.0
CUDA Compute Capability : 2.1
Found CUDA Capable Device 0: "GeForce GTX 560 Ti"
Setting active device to 0
Running Standard Demonstration with GLUT loop...
Press '+' and '-' to change filter width
Press ']' and '[' to change number of iterations
Press 'a' or 'A' to change animation ON/OFF
20. boxFilterNPP.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
cudaSetDevice GPU0 = GeForce GTX 560 Ti
NPP Library Version 5.0.35
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe using GPU
<GeForce GTX 560 Ti> wi
th 8 SM(s) with Compute 2.1
boxFilterNPP opened: <../../../common/data/Lena.pgm> successfully!
Saved image: ../../../common/data/Lena_boxFilter.pgm
21. cdpAdvancedQuicksort.exe / Failure
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpAdvancedQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
22. cdpLUDecomposition.exe / Failure
Starting LU Decomposition (CUDA Dynamic Parallelism)
GPU device GeForce GTX 560 Ti has compute capabilities (SM 2.1)
cdpLUDecomposition requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
23. cdpQuadTree.exe / Failure
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpQuadTree requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
24. cdpSimplePrint.exe / Failure
starting Simple Print (CUDA Dynamic Parallelism)
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpSimplePrint requires GPU devices with compute SM 3.5 or higher. Exiting...
25. cdpSimpleQuicksort.exe / Failure
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpSimpleQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
27. Summary
GTX560, Some samples does not work fine.
→ MUST support CUDA compute capability 3.0.
→ Requires GPU devices with compute SM 3.5 or
higher.
This evaluation to be continued, For future
reference.