- AVX-512 is a new SIMD instruction set introduced by Intel in 2017 that supports 512-bit registers and many new instructions. It can provide performance benefits for multimedia workloads.
- FFmpeg now supports AVX-512 via function pointers that detect CPU capabilities. Existing projects like dav1d have added support with gains of 10-20% for AV1 decoding.
- New instructions like vpermb, variable shifts, and vpternlogd allow more efficient implementations of tasks like byte shuffling and packing that were previously difficult. The FFmpeg v210 encoder saw nearly a 2x speedup on Ice Lake versus AVX2 with these instructions.
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
Talk on Netflix Keystone by Peter Bakas at SF Data Engineering Meetup on 2/23/2016.
Topics covered:
- Architectural design and principles for Keystone
- Technologies that Keystone is leveraging
- Best practices
http://www.meetup.com/SF-Data-Engineering/events/228293610/
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
Talk on Netflix Keystone by Peter Bakas at SF Data Engineering Meetup on 2/23/2016.
Topics covered:
- Architectural design and principles for Keystone
- Technologies that Keystone is leveraging
- Best practices
http://www.meetup.com/SF-Data-Engineering/events/228293610/
5G cellular communications opens an opportunity for a wider realm in networking, one which can encompass the Internet (including IOT). With D2D, Device-to-Device communications, devices can cluster themselves into high power computing platforms.
We show the practicality of mapping RDMA over wireless. Think wireless InfiniBand or wireless OmniPath.
The ensuing number of cluster can be exponentially larger than the number of devices. Clusters can run side-by-side as multi instance software.
See how protocol-free networking becomes realistic as D2D clusters facilitate networking code becoming loadable, user-space software.
Intermediate: 5G Applications Architecture - A look at Application Functions ...3G4G
In this tutorial we look at the 5G Applications architecture. We discuss 5G applications, application functions and application servers and how they fit together in a 5G Service Based Architecture
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
Propelling 5G forward: a closer look at 3GPP Release-16Qualcomm Research
This presentation summarizes the 3GPP 5G NR Release 16 projects, including eMBB enhancements, unlicensed, sidelink, IAB, TSN, eURLLC, private networks, C-V2X, and more...
3GPP Packet Core Towards 5G Communication SystemsOfinno
This presentation provides an overview of 3GPP packet core and 5G systems. Some enabler features are outlined, such as network slicing. This presentation was prepared for the 20th Annual International Conference on Next Generation Internet and Related Technologies Net-Centric 2017 that was held at George Mason University.
Beginners: 5G Terminology (Updated - Feb 2019)3G4G
An updated short presentation and video looking at 5G terminology that is being used in 3GPP standards and specifications.
Terms such as NG-RAN, NR, ng-eNB, en-gNB, RIT, SRIT, Option 3, etc. will be discussed
This video / presentation looks at how RRC states have changed from 3G to 4G and not to 5G NR. Why was the new RRC_INACTIVE state introduced; how will it reduce the latency which will help URLLC; what are the different NAS states in 4G EPC and 5G Core and how do they map to RRC and UE states. Finally we look at some basic signalling example to understand the RRC Resume and Inactive state.
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
Tutorial at IEEE IM 2019.
The tutorial will provide a comprehensive coverage of the Network Automation domain starting with the scope and definitions, introducing the challenges and then developing the different approaches to realize complete future network automation solutions. A special focus will be put on the newly created ETSI ISG ZSM "Zero Touch Network and Service Management" and the standardization landscape.
Overview 5G NR Radio Protocols by Intel Eiko Seidel
Very nice overview of the 5G Radio Interface protocol as defined by 3GPP in NR Rel.15. The document was submitted to the 3GPP workshop on ITU submission in Brussels on Oct 24, 2018.
Machine to Machine (M2M) Communication is an integral part of IoT. While it's being widely discussed today, this subject has over a half a century history of evolution. This session is about the how M2M communication evolved, method & techniques, applications and the opportunities made possible.
Introducing ultra-precise time for server-hosted applicationsADVA
OSA SoftSync™ offers a straightforward and cost-effective way to add our Oscilloquartz PTP capabilities to open servers and network devices. Now network operators and enterprises can deliver robust and highly precise synchronization as a software application. Find out how you can easily migrate to precise and resilient packet timing that can run on any Linux machine with our new OSA SoftSync™.
Advanced: Control and User Plane Separation of EPC nodes (CUPS)3G4G
This presentation looks at Control and User Plane Separation of EPC nodes (CUPS) which was completed by 3GPP as part of Release 14 specifications and is set to be a key core network feature for many operators.
CUPS provides the architecture enhancements for the separation of functionality in the Evolved Packet Core’s SGW, PGW and TDF. This enables flexible network deployment and operation, by distributed or centralized deployment and the independent scaling between control plane and user plane functions - while not affecting the functionality of the existing nodes subject to this split.
Beginners: Different Types of RAN Architectures - Distributed, Centralized & ...3G4G
In this basic tutorial we look at different types of RAN architectures that are always being discussed. We start with the Distributed RAN (D-RAN) and then look at Centralized and Cloud RAN (both referred to as C-RAN) architectures. We also quickly look at RAN functional splits for 5G and then tie this all together.
We also look at how Samsung and Nokia discuss these architectures in the context of 5G.
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
Open RAN Page: https://www.3g4g.co.uk/OpenRAN/
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
5G cellular communications opens an opportunity for a wider realm in networking, one which can encompass the Internet (including IOT). With D2D, Device-to-Device communications, devices can cluster themselves into high power computing platforms.
We show the practicality of mapping RDMA over wireless. Think wireless InfiniBand or wireless OmniPath.
The ensuing number of cluster can be exponentially larger than the number of devices. Clusters can run side-by-side as multi instance software.
See how protocol-free networking becomes realistic as D2D clusters facilitate networking code becoming loadable, user-space software.
Intermediate: 5G Applications Architecture - A look at Application Functions ...3G4G
In this tutorial we look at the 5G Applications architecture. We discuss 5G applications, application functions and application servers and how they fit together in a 5G Service Based Architecture
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
Propelling 5G forward: a closer look at 3GPP Release-16Qualcomm Research
This presentation summarizes the 3GPP 5G NR Release 16 projects, including eMBB enhancements, unlicensed, sidelink, IAB, TSN, eURLLC, private networks, C-V2X, and more...
3GPP Packet Core Towards 5G Communication SystemsOfinno
This presentation provides an overview of 3GPP packet core and 5G systems. Some enabler features are outlined, such as network slicing. This presentation was prepared for the 20th Annual International Conference on Next Generation Internet and Related Technologies Net-Centric 2017 that was held at George Mason University.
Beginners: 5G Terminology (Updated - Feb 2019)3G4G
An updated short presentation and video looking at 5G terminology that is being used in 3GPP standards and specifications.
Terms such as NG-RAN, NR, ng-eNB, en-gNB, RIT, SRIT, Option 3, etc. will be discussed
This video / presentation looks at how RRC states have changed from 3G to 4G and not to 5G NR. Why was the new RRC_INACTIVE state introduced; how will it reduce the latency which will help URLLC; what are the different NAS states in 4G EPC and 5G Core and how do they map to RRC and UE states. Finally we look at some basic signalling example to understand the RRC Resume and Inactive state.
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
Tutorial at IEEE IM 2019.
The tutorial will provide a comprehensive coverage of the Network Automation domain starting with the scope and definitions, introducing the challenges and then developing the different approaches to realize complete future network automation solutions. A special focus will be put on the newly created ETSI ISG ZSM "Zero Touch Network and Service Management" and the standardization landscape.
Overview 5G NR Radio Protocols by Intel Eiko Seidel
Very nice overview of the 5G Radio Interface protocol as defined by 3GPP in NR Rel.15. The document was submitted to the 3GPP workshop on ITU submission in Brussels on Oct 24, 2018.
Machine to Machine (M2M) Communication is an integral part of IoT. While it's being widely discussed today, this subject has over a half a century history of evolution. This session is about the how M2M communication evolved, method & techniques, applications and the opportunities made possible.
Introducing ultra-precise time for server-hosted applicationsADVA
OSA SoftSync™ offers a straightforward and cost-effective way to add our Oscilloquartz PTP capabilities to open servers and network devices. Now network operators and enterprises can deliver robust and highly precise synchronization as a software application. Find out how you can easily migrate to precise and resilient packet timing that can run on any Linux machine with our new OSA SoftSync™.
Advanced: Control and User Plane Separation of EPC nodes (CUPS)3G4G
This presentation looks at Control and User Plane Separation of EPC nodes (CUPS) which was completed by 3GPP as part of Release 14 specifications and is set to be a key core network feature for many operators.
CUPS provides the architecture enhancements for the separation of functionality in the Evolved Packet Core’s SGW, PGW and TDF. This enables flexible network deployment and operation, by distributed or centralized deployment and the independent scaling between control plane and user plane functions - while not affecting the functionality of the existing nodes subject to this split.
Beginners: Different Types of RAN Architectures - Distributed, Centralized & ...3G4G
In this basic tutorial we look at different types of RAN architectures that are always being discussed. We start with the Distributed RAN (D-RAN) and then look at Centralized and Cloud RAN (both referred to as C-RAN) architectures. We also quickly look at RAN functional splits for 5G and then tie this all together.
We also look at how Samsung and Nokia discuss these architectures in the context of 5G.
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
Open RAN Page: https://www.3g4g.co.uk/OpenRAN/
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
Java on arm theory, applications, and workloads [dev5048]Aleksei Voitylov
Although ARM processors are almost always viewed as having been designed for the embedded market, several vendors are making a bet and building server CPUs that contend with Intel in cloud deployments. With the presence of the Java ARM port and a wide variety of applications in the Java ecosystem able to run on ARM CPUs, the real question becomes which workloads are best suited to the ARM servers niche and which metrics can be optimized for using ARM servers. This presentation explores the status of Java and the Java ecosystem on ARM, together with the Java ARM port features and performance of specific workloads. Some focus is on the recent changes in the Java ARM port, which the speaker’s company contributes to.
Advanced High-Performance Computing Features of the Open Power ISAGanesan Narayanasamy
Power ISA processors have a long history of offering superior features for HPC applications. Well known examples include POWER3, used in the ASCI White supercomputer, various PowerPC processors used in the Blue Gene family of massively parallel computers, and POWER9, present in the leading supercomputers of today, Summit and Sierra. Open Power ISA has enabled open access to many of these features. IBM's most recent contribution to Open Power ISA, in the form of Power ISA Version 3.1, includes the Matrix-Multiply Assist (MMA) instructions. The MMA instructions are designed to deliver additional performance both for classical high-performance computing, in the space of scientific and technical computing, and for the increasingly important space of business analytics. In addition, the Open Memory Interface (OMI), also developed by IBM, opens new levels of memory bandwidth and capacity for the most demanding applications. Our goal is to raise awareness of and interest in these new features, which we believe can lead to further research in processor architecture and programming environments. Some of the most promising application areas include graph algorithms, classical machine learning and deep learning
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaumeurobsdcon
Abstract
The MINIX 3 microkernel has been used as a base to reimplement NetBSD. To application programs, MINIX 3 looks like NetBSD, with the NetBSD headers, libraries, package manager, etc. Thousands of NetBSD packages run on it on the x86 and ARM Cortex V8 (BeagleBones). Inside, however, it is a completely different architecture, with a tiny microkernel and independent servers for memory management, the file system, and each device driver. This architecture has many valuable properties which will be described in the talk, including better security and the ability to recover from many component crashes without running applications even noticing. Updating to a new version of the operating system while it is running and without a reboot is on the roadmap for the future.
Running Applications on the NetBSD Rump Kernel by Justin Cormack eurobsdcon
Abstract
The NetBSD rump kernel has been developed for some years now, allowing NetBSD kernel drivers to be used unmodified in many environments, for example as userspace code. However it is only since last year that it has become possible to easily run unmodified applications on the rump kernel, initially with the rump kernel on Xen port, and then with the rumprun tools to run them in userspace on Linux, FreeBSD and NetBSD. This talk will look at how this is achieved, and look at use cases, including kernel driver development, and lightweight process virtualization.
Speaker bio
Justin Cormack has been a Unix user, developer and sysadmin since the early 1990s. He is based in London and works on open source cloud applications, Lua, and the NetBSD rump kernel project. He has been a NetBSD developer since early 2014.
Various processor architectures are described in this presentation. It could be useful for people working for h/w selection and processor identification.
Baby Demuxed's First Assembly Language FunctionKieran Kunhya
Hand written assembly language functions are the bedrock of any real-world codec implementation. It's the reason any online video encodes at a reasonable speed or decodes without looking like a stuttery mess. Yet knowledge of writing assembly language functions is slowly disappearing.
This presentation will explain how assembly language functions are important in video codecs and try to teach the audience how to write their first x86 assembly language function. This might sound hard, but high-schoolers have literally written the functions you use every single day.
Stable Feed and Lower Costs with Use of 5G and Satellite Stable Feed and Lowe...Kieran Kunhya
- New and innovative way to contribute and distribute high quality video content from anywhere in the world
- Combining satellite with 5G to deliver stable feed at from the UAE to South America
- What content providers can learn to address “walled-garden” cellular bonding solutions’ lack of flexibility and quality for high-quality sports transmissions
IBC 2022 IP Showcase - Timestamps in ST 2110: What They Mean and How to Measu...Kieran Kunhya
Timestamps in ST 2110 are exceptionally important but many ST 2110 tutorials lack the time to go into detail about how they work, how they relate to PTP, and, for example, the differences between RTP timestamps and packet arrival times. This presentation will aim to fill that gap and allow engineers to diagnose problems with timestamps based on examples from real world facilities.
How you shouldn't just look at IP technologies in broadcast but also look at how off-the-shelf IT equipment can be used. Presented at NAB BEITC Engage! 2017
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
2. What is Assembly Language?
• A low-level (close to the hardware)
programming language
• Intel x86 assembly language is
(somewhat) backwards compatible to
the 8008 CPU from 1972!
• Languages like C are compiled into
Assembly Language
• Mnemonics are human readable
versions of machine code
3. What is AVX-512?
• New SIMD (Single Instruction Multiple Data) Instruction Set for Intel since
2017, and more recently AMD CPUs
• 512-bit register sizes (zmm)
• Many new instructions
• Opmasks
• Comparison instructions
• Many things not so interesting in multimedia (e.g cryptography, neural
networks)
• Lots of fancy words, but high schoolers have written assembly in FFmpeg!
4. Why is this relevant now?
• AVX-512 has been around since 2017. Why is it relevant
now?
• Old Skylake (server) systems had large performance
throttling when using larger registers. AVX-512 remained
unused in multimedia
• Could still use new instructions with smaller registers
• Can be beneficial in some cases (see later)
• Ice Lake (10th and 11th Gen Intel) first to have no throttling
5. How to get started?
• Intel have removed AVX-512 support
in consumer processors (from 12th
Gen)
• Available in AMD Zen 4
• Still exists in Intel Server CPUs (Xeon).
Available from cloud providers
• Easiest way is to buy an Intel NUC
(11th Gen)
6. Existing work in multimedia using AVX-512
• dav1d project added AVX-512 support
• AV1 decoding, particularly beneficial owing to large block sizes
• 10-20% faster *overall* decoding
• Classic FFmpeg/x264 approach to assembly
• No intrinsics, nor inline assembly
• Detect CPU capabilities and set function pointers to appropriate
functions
• Messy Venn Diagram of capabilities, but two in practice we care
about
7. CPU_FLAGS in FFmpeg
int av_get_cpu_flags(void);
#define AV_CPU_FLAG_AVX512 0x100000 ///<
AVX-512 functions: requires OS support even if
YMM/ZMM registers aren't used
#define AV_CPU_FLAG_AVX512ICL 0x200000 ///<
F/CD/BW/DQ/VL/VNNI/IFMA/VBMI/VBMI2/VPOPCNTDQ/BIT
ALG/GFNI/VAES/VPCLMULQDQ
8. Lanes
• Older AVX ymm registers are split
into lanes
• Instructions (mainly) operate in lanes
• Can be tricky to move data between
lanes
• Limitation on AVX2 code
16-byte Lane 1 16-byte Lane 0
256-bit ymm Register (old)
9. K-mask registers
• A new set of registers k0-k7 that allow the destination
register to remain unchanged or set to zero
• For example, addition, but only some values:
• paddw zmm1{k1}, zmm2, zmm3
• kmovX instructions to manipulate k-mask registers
10. vpermb
• Byte shuffles (permute) are the one
of the most important instructions
in multimedia
• Similar to existing pshufb
instruction but cross-lane
• Need to use k-masks with vpermb
to zero out a byte
• e.g for zigzag scan
• Also, VPERMT2B, permute from
two registers
A B C D E F G H
E A 0 C C B H D
11. Variable shifts
• vpsrlvw/vpsrlvd/vpsrlvq – variable right shift (logical)
• vpsllvw/vpsllvd/vpsllvq – variable left shift (logical)
• (letter soup, especially arithmetic shifts!)
• Historically had to use multiple instructions and various trickery
to achieve right shifts. Had many limitations.
• Faster than multiply for left shifts
12. vpternlogd
• The kitchen sink of instructions!
• Allows a programmable truth table to be
implemented per bit input in each register
• Can replace up to eight instructions!
• e.g vpternlogd zmm0, zmm1, zmm2,
0xca
• zmm0 = zmm0 ? zmm1 : zmm2; (for each bit)
• http://0x80.pl/articles/avx512-ternary-
functions.html
13. Example: v210enc (1)
• .loop:
• movu ym1, [yq + 2*widthq]
• vinserti32x4 m1, [uq + 1*widthq], 2
• vinserti32x4 m1, [vq + 1*widthq], 3
• vpermb m1, m2, m1 ; uyvx yuyx vyux yvyx
• CLIPUB m1, m4, m5
•
pmaddubsw m0, m1, m3 ; shift high and low samples of each dword and mask out other bits
• pslld m1, 4 ; shift center sample of each dword
• vpternlogd m0, m1, m6, 0xd8 ; C?B:A ; merge and mask out bad bits from B
•
movu [dstq], m0
• […]
• jl .loop
• Takes 3x 8-bit samples, extends to 10-bit and packs into 32-bits
14. Example: v210enc (2)
• Benchmarks (decicyles)
• Skylake
• v210_planar_pack_8_c: 2373.5
• […]
• v210_planar_pack_8_avx2: 194.0
• v210_planar_pack_8_avx512: 174.0
• vpternlogd on a shorter ymm register
• Ice Lake
• v210_planar_pack_8_c: 2743.6
• v210_planar_pack_8_avx2: 246.6
• v210_planar_pack_8_avx512: 238.6
• v210_planar_pack_8_avx512icl: 122.1
• vpermb and zmm nearly twice as fast as avx512 ymm, and more than twenty times
faster than C!
15. What AVX-512 code next?
• Anything involving line/frame based processing
• e.g filters, scalers etc.
• Comparisons
• Vpternlogd in many places (e.g 3-way boolean)
• Also change variable shifts in many places
• Intel manual is very verbose (useful in some cases)
• https://www.officedaytime.com/simd512e/