Scott Gray presents at the 2016 ICML conference. Scott Gray went over various ways of computing convolution in the workshop on "On-device Intelligence".
High-Performance GPU Programming for Deep LearningIntel Nervana
This session goes over many of the techniques we use at Nervana in GPU programming to achieve state-of-the-art performance for deep learning networks. The main focus will be on the customization of dense linear algebra kernels: Winograd 3x3 convolution, direct convolution, and small tile GEMM (matrix multiply). In particular, we'll look at how we achieve high utilization at very small mini batches which is important for multi-gpu scaling and inference. In addition we'll talk about where and how you can effectively leverage lower and mixed precision to further increase performance without loss in accuracy.
Efficient occlusion culling in dynamic scenes is a very important topic to the game and real-time graphics community in order to accelerate rendering. We present a novel algorithm inspired by recent advances in depth culling for graphics hardware, but adapted and optimized for SIMD-capable CPUs. Our algorithm has very low memory overhead and is three times faster than previous work, while culling 98% of all triangles by a full resolution depth buffer approach. It supports interleaving occluder rasterization and occlusion queries without penalty, making it easy
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
High-Performance GPU Programming for Deep LearningIntel Nervana
This session goes over many of the techniques we use at Nervana in GPU programming to achieve state-of-the-art performance for deep learning networks. The main focus will be on the customization of dense linear algebra kernels: Winograd 3x3 convolution, direct convolution, and small tile GEMM (matrix multiply). In particular, we'll look at how we achieve high utilization at very small mini batches which is important for multi-gpu scaling and inference. In addition we'll talk about where and how you can effectively leverage lower and mixed precision to further increase performance without loss in accuracy.
Efficient occlusion culling in dynamic scenes is a very important topic to the game and real-time graphics community in order to accelerate rendering. We present a novel algorithm inspired by recent advances in depth culling for graphics hardware, but adapted and optimized for SIMD-capable CPUs. Our algorithm has very low memory overhead and is three times faster than previous work, while culling 98% of all triangles by a full resolution depth buffer approach. It supports interleaving occluder rasterization and occlusion queries without penalty, making it easy
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
GDC16: Arbitrary amount of 3D data running on Gear VR by Vinh TruongUmbra Software
In this talk programmer Vinh Truong explains how it is possible to render huge amounts of 3D data on low capacity devices like smartphones simply by using Umbra's optimization technology.
This session was hosted by ARM at GDC16
Fragging Rights: A Tale of a Pathological Storage WorkloadEric Sproul
There is a lot to love about ZFS. The simple administration, strong integrity checks, copy-on-write performance, and block-transform features are an operator's dream. This is a story of how a filesystem workload that looked good on a whiteboard turned out to have a darkside when run on ZFS, and how a combination of ZFS improvements and rethinking the application got us out of trouble.
We maximise the performance of K-means by applying two types of parallelism:
- MIMD (Multiple Instruction Multiple Data)
- SIMD (Single Instruction Multiple Data)
University of Virginia
cs4414: Operating Systems
http://rust-class.org
The Internet
Benchmarking: Customer vs. Developer
Cheating on Benchmarks
Networking
Latency and Bandwidth
Tracing Routes
Network Layers
For embedded notes and videos, see:
http://rust-class.org/class-13-the-internet.html
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
Scaling infrastructure is tricky,
I will try to explain what methods I use when dealing with this issue, and demonstrate an approach which can be applied to almost any type of work load.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
Deep learning is unlocking tremendous economic value across various market sectors. Individual data scientists can draw from several open source frameworks and basic hardware resources during the very initial investigative phases but quickly require significant hardware and software resources to build and deploy production models. Intel Nervana has built a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. Nervana’s platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data. In this talk, we will give an overview of Nervana’s DL platform and get some hands-on experience using this platform to train and execute deep learning models.
Speaker: Will Constable
Join our Meetup Group: https://www.meetup.com/SV-Deep-Learning/
GDC16: Arbitrary amount of 3D data running on Gear VR by Vinh TruongUmbra Software
In this talk programmer Vinh Truong explains how it is possible to render huge amounts of 3D data on low capacity devices like smartphones simply by using Umbra's optimization technology.
This session was hosted by ARM at GDC16
Fragging Rights: A Tale of a Pathological Storage WorkloadEric Sproul
There is a lot to love about ZFS. The simple administration, strong integrity checks, copy-on-write performance, and block-transform features are an operator's dream. This is a story of how a filesystem workload that looked good on a whiteboard turned out to have a darkside when run on ZFS, and how a combination of ZFS improvements and rethinking the application got us out of trouble.
We maximise the performance of K-means by applying two types of parallelism:
- MIMD (Multiple Instruction Multiple Data)
- SIMD (Single Instruction Multiple Data)
University of Virginia
cs4414: Operating Systems
http://rust-class.org
The Internet
Benchmarking: Customer vs. Developer
Cheating on Benchmarks
Networking
Latency and Bandwidth
Tracing Routes
Network Layers
For embedded notes and videos, see:
http://rust-class.org/class-13-the-internet.html
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
Scaling infrastructure is tricky,
I will try to explain what methods I use when dealing with this issue, and demonstrate an approach which can be applied to almost any type of work load.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
Deep learning is unlocking tremendous economic value across various market sectors. Individual data scientists can draw from several open source frameworks and basic hardware resources during the very initial investigative phases but quickly require significant hardware and software resources to build and deploy production models. Intel Nervana has built a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. Nervana’s platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data. In this talk, we will give an overview of Nervana’s DL platform and get some hands-on experience using this platform to train and execute deep learning models.
Speaker: Will Constable
Join our Meetup Group: https://www.meetup.com/SV-Deep-Learning/
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
Deep learning is unlocking tremendous economic value across various market sectors. Individual data scientists can draw from several open source frameworks and basic hardware resources during the very initial investigative phases but quickly require significant hardware and software resources to build and deploy production models. Intel offers various software and hardware to support a diversity of workloads and user needs. Intel Nervana delivers a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. This platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data.
Urs Köster - Convolutional and Recurrent Neural NetworksIntel Nervana
Speaker: Urs Köster, PhD
Urs will join us to dive deep into the field of Deep Learning and focus on Convolutional and Recurrent Neural Networks. The talk will be followed by a workshop highlighting neon™, an open source python based deep learning framework that has been built from the ground up for speed and ease of use.
End-to-end speech recognition in Neon presented by Anthony Ndirango and Tyler Lee
Modern automatic speech recognition systems incorporate tremendous amount of expert knowledge and a wide array of machine learning techniques. The promise of deep learning is to strip away much of this complexity in favor of the flexibility of neural networks. We will describe our efforts in implementing end-to-end speech recognition in neon by combining convolutional and recurrent neural networks to create an acoustic model followed by a graph-based decoding scheme. These types of models are trained to go directly from raw waveforms to transcribed speech without requiring any kind of explicit forced alignment. We will also discuss additional challenges that must be overcome to produce state-of-the-art results.
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning AccelerationIntel® Software
In this presentation, you will hear a story about how Intel graphics can accelerate deep learning applications. The method is simple and reproducible, with impressive results of up to four times over the original CPU performance. We introduce clCaffe*, an extension of the well-known Caffe* framework with OpenCL™ standard. This OpenCL™ standard enables primitives of the convolutional neural networks (CNN) pipeline to operate on GPU (graphics processing unit), FPGA (field programmable gate array) or any device with OpenCL support. Once set up, Caffe users can seamlessly toggle to clCaffe to take advantage of Intel graphics acceleration. Compared with original CPUs, Intel graphics presents 2.5x speedup (AlexNet classification), or 4.0x (GoogleNet classification) on 5th or 6th generation Intel® Core™ processors. Finally, we give a detailed analysis of clCaffe performance, and identify the lacking components in Intel Graphics software stack that impair its performance in the deep learning support.
Video Activity Recognition and NLP Q&A Model ExampleIntel Nervana
In this presentation, Sathish Nagappan will introduce the UCF-101 video activity recognition dataset and discuss how 3-D convolutions work. A demo will be presented on how to predict actions in video clips. Lastly, an NLP Q&A model example will be presented.
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
Speaker: Arjun Bansal, co-founder of Nervana Systems
Arjun Bansal’s workshop focused on neon, an open-source python based deep learning framework that has been build from the ground up for speed and ease of use. The workshop highlights how to use neon, build Recurrent Recurrent Neural Networks to generate and analyze text, and build Convolutional Autoencoders to generate images and to localize objects. Arjun also demoed the integration of neon with the Nervana cloud (in private beta) for multi-GPU training of deep networks.
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Intel Nervana
Intel Nervana has built a competitive deep learning platform to make it easy for data scientists to start from the iterative, investigatory phase and take models all the way to deployment. Nervana’s platform is designed for speed and scale, and serves as a catalyst for all types of organizations to benefit from the full potential of deep learning. Example of supported applications include but not limited to automotive speech interfaces, image search, language translation, agricultural robotics and genomics, financial document summarization, and finding anomalies in IoT data.
Introduction to Deep Learning with Will ConstableIntel Nervana
Deep Residual Nets, Activity recognition in videos, and Q&A systems using neon and the Nervana Cloud
Will Constable will start with an introduction to the field of Deep Learning, neon and the Nervana Cloud. The presentation will be followed by an interactive workshop using neon. neon is an open-source Python based Deep Learning framework that has been built from the ground up for speed, scalability and ease of use.
Gary Paek from Intel presented this deck at the HPC User Forum in Tucson.
Learn more: https://software.intel.com/en-us/tags/18892
and
http://hpcuserforum.com
Watch the video presentation: http://wp.me/p3RLHQ-fdt
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Urs Köster and Yinyin Liu present at ODSC West. Deep learning has had a major impact in the last three years. Imperfect interactions with machines, such as speech, natural language, or image processing have been made robust by deep learning and deep learning holds promise in finding usable structure in large datasets. The training process is lengthy and has proven to be difficult to scale due to constraints of existing compute architectures and there is a need of standardized tools for building and scaling deep learning solutions. Urs will outline some of these challenges and how fundamental changes to the organization of computation and communication can lead to large advances in capabilities. Urs will dive deep into the field of Deep Learning and focus on Convolutional and Recurrent Neural Networks. The talk will be followed by a workshop highlighting neon™, an open source python based deep learning framework that has been built from the ground up for speed and ease of use. This session is targeted at data scientists and researchers interested in taking deep learning to the next level of speed and scalability. The tutorial covers how to use neon™ to build and train Recurrent Neural Networks to generate text, and Convolutional Networks to perform image classification.
Anil Thomas dives deep into the field of Deep Learning and focuses on object recognition. This talk will start with a general overview of how to use neon, Convolutional Neural Networks (CNN) and applying neon to an object recognition Kaggle problem. The talk is followed by a workshop highlighting neon, an open source python based deep learning framework that has been built from the ground up for speed and ease of use.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/02/improving-power-efficiency-for-edge-inferencing-with-memory-management-optimizations-a-presentation-from-samsung/
Nathan Levy, Project Leader at Samsung, presents the “Improving Power Efficiency for Edge Inferencing with Memory Management Optimizations” tutorial at the September 2020 Embedded Vision Summit.
In the race to power efficiency for neural network processing, optimizing memory use to reduce data traffic is critical. Many processors have a small local memory (typically SRAM) used as a scratch pad which can be used to reduce the expensive data traffic to and from a big remote memory (e.g., DRAM). The specific structure of neural networks allows for advanced optimization techniques to optimize the use of the local memory.
In this presentation, Levy describes the key aspects of memory management optimization for neural networks along with the trade-offs that must be managed in light of the processor architecture and the details of the network. In addition, he shows the importance of tailoring the memory management approach to the specific network, illustrated by analysis of a case study.
Rainbow Over the Windows: More Colors Than You Could ExpectPeter Hlavaty
As time goes on operating systems keep evolving, like Microsoft Windows do, it ships new designs, features and codes from time to time. However sometimes it also ships more than bit of codes for complex subsystems residing in its kernel ... and at some future point it starts implementing new designs to prevent unnecessary access to it. However is it safe enough?
As we can see from security bulletins, win32k subsystem attracts lots of attention. It looks that with efforts of many security researchers who has dug into this area, finding bugs here shall becomes pretty tough and almost fruitless. But unfortunately this is not true, as win32k is backed up by very complex logic and large amount of code by nature..
We will present our point of view to Windows graphic subsystem, as well as schema of our fuzzing strategies. We will introduce some unusual areas of win32k, its extensions and how it can breaks even locked environments.
Part of our talk will be dedicated to CVE-2016-0176, the bug we used for this year's Pwn2Own Edge sandbox bypass, from its discovery to its exploitation techniques, which could serves as an example for universal DirectX escape which is independent of graphics vendors.
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
Deep learning techniques ignited a great progress in many computer vision tasks like image classification, object detection, and segmentation. Almost every month a new method is published that achieves state-of-the-art result on some common benchmark dataset. In addition to that, DL is being applied to new problems in CV.
In the talk we’re going to focus on DL application to image segmentation task. We want to show the practical importance of this task for the fashion industry by presenting our case study with results achieved with various attempts and methods.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/07/efficiently-map-ai-and-vision-applications-onto-multi-core-ai-processors-using-cevas-parallel-processing-framework-a-presentation-from-ceva/
Rami Drucker, Machine Learning Software Architect at CEVA, presents the “Efficiently Map AI and Vision Applications onto Multi-core AI Processors Using CEVA’s Parallel Processing Framework” tutorial at the May 2023 Embedded Vision Summit.
Next-generation AI and computer vision applications for autonomous vehicles, cameras, drones and robots require higher-than-ever computing power. Often, the most efficient way to deliver high performance (especially in cost- and power-constrained applications) is to use multi-core processors. But developers must then map their applications onto the multiple cores in an efficient manner, which can be difficult. To address this challenge and streamline application development, CEVA has introduced the Architecture Planner tool as a new element in CEVA’s comprehensive AI SDK.
In this talk, Drucker shows how the Architecture Planner tool analyzes the network model and the processor configuration (number of cores, memory sizes), then automatically maps the workload onto the multiple cores in an efficient manner. He explains key techniques used by the tool, including symmetrical and asymmetrical multi-processing, partition by sub-graphs, batch partitioning and pipeline partitioning.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
Inside Cassandra – C* is an interesting piece of software for many reasons, but it is especially interesting in its use of elegant data structures and algorithms. This talk will focus on the data structures and algorithms that make C* such a scalable and performant database. We will walk along the write, read and delete paths exploring the low-level details of how each of these operations work. We will also explore some of the background processes that maintain availability and performance. The goal of this talk is to gain a deeper understanding of C* by exploring the low-level details of its implementation.
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
At a rate of almost 9 changes per hour (24/7), the Linux kernel is definitely a scary beast. Bugs are introduced on a daily basis and, through the use of multiple code analyzers, *some* of them are detected and fixed before they hit mainline. Over the course of the last few years, Gustavo has been fixing such bugs and many different issues in every corner of the Linux kernel. Recently, he was in charge of leading the efforts to globally enable -Wimplicit-fallthrough; which appears by default in Linux v5.3. This presentation is a report on all the stuff Gustavo has found and fixed in the kernel with the support of the Core Infrastructure Initiative.
Gustavo A.R. Silva
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
1. An Analysis of Convolution for Inference
24 June 2016
Scott Gray
Nervana Systems
MAKING MACHINES SMARTER.™
2. Proprietary and confidential. Do not distribute.ner va na
Direct Convolution
2
• Compute with in-place slicing + gemm
• Data layout considerations: C, H, W, N
• Minimize slicing logic
• Maximize contiguous access
• Leverage filter overlap
3. Proprietary and confidential. Do not distribute.ner va na
Small N direct convolution: Without Superblocking
3
fprop
Q = (W-S+1 + 2 * pad) / stride
wi = sk + qj * stride - pad
Fig from V. Dumoulin,
https://github.com/vdumoulin/conv_arithmetic
4. Proprietary and confidential. Do not distribute.ner va na
Small N direct convolution: With Superblocking
4
fprop
Q = (W-S+1 + 2 * pad) / stride
wi = sk + qj * stride - pad
5. Proprietary and confidential. Do not distribute.ner va na
Small N direct convolution: Bprop for deconv
5
bprop
pad’ = S - pad - 1
wi = (qj - pad’ + sk) / stride
6. Proprietary and confidential. Do not distribute.ner va na
Small N direct convolution: Dilated Filters
6
Dilated
S’ = (S-1) * rate + 1
Q = (W-S’+1 + 2*pad) / stride
wi = sk * rate + qj * stride - pad
Fig from F. Yu, V. Koltun
http://arxiv.org/abs/1511.07122v3
7. Proprietary and confidential. Do not distribute.ner va na
Convolution with Algorithmic Speedups
7
• FFT and Winograd have same basic computational flow
• FFT tiles typically need to be much bigger
• Winograd history: Toom and Cook, then Lavin
8. Proprietary and confidential. Do not distribute.ner va na
Winograd: input transform
8
Input Feature Map
4x4 stride 2
• Input transform
• 2D Winograd is a nested
product of 1D transforms
• Transforms can be
simplified to remove zeros
9. Proprietary and confidential. Do not distribute.ner va na
Winograd: filter transform
9
• Filter transform
• Same as input but with
different coefficients
• Transform each feature map
independently
10. Proprietary and confidential. Do not distribute.ner va na
Winograd: batched GEMM
10
• Point-wise Multiplication
• Posed as batched GEMM
operation
11. Proprietary and confidential. Do not distribute.ner va na
Winograd: output transform
11
Output Feature Map
• Output transform
• Same as input and filter
• Transform back to pixel
space to obtain 2x2 output
tile
14. Proprietary and confidential. Do not distribute.ner va na
Multiplier Transistor Efficiency
14
Algo bits speedup transistors
performance
/ transistor
Direct 8 1.0 3000 1
2x2 9 2.25 3750 1.8
4x4 12 4.0 6000 2.0
Transistor Counts from Wikipedia:
15. Proprietary and confidential. Do not distribute.ner va na
Logarithmic quantization
15
D. Miyashita, EH. Lee, B. Murmann
Convolutional Neural Networks using Logarithmic Data Representation
http://arxiv.org/abs/1603.01025v2
16. Proprietary and confidential. Do not distribute.ner va na 16
Performance: VGG fp32 on GTX1080effectiveTFLOPS
Batch Size
VGG - Totals:
0
5
10
15
20
25
64 32 16 8 4 2 1
Neon Direct
Neon F(2x2,3x3)
Neon F(4x4,3x3)
cuDNN FFT
17. Proprietary and confidential. Do not distribute.ner va na 17
Peak Performance: VGG fp32 on GTX1080effectiveTFLOPS
Batch Size
VGG - Layer 4.2:
0
5
10
15
20
25
64 32 16 8 4 2 1
Neon Direct
Neon F(2x2,3x3)
Neon F(4x4,3x3)
cuDNN FFT