This document discusses SIMD (Single Instruction Multiple Data) instructions both outside and inside Oracle 12c. It provides an overview of SIMD instructions on Intel architectures, how they can improve performance, and how Oracle 12c leverages SIMD registers and instructions for in-memory columnar storage and filtering. The document also discusses how to trace SIMD instruction usage inside Oracle using tools like gdb and systemtap.
All on Adaptive and Extended Cursor SharingMohamed Houri
Dig into details on how Oracle has implemented Adaptive Cursor Sharing feature to make using bind variables and having optimal plans at each query execution possible
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
From Tanel Poder's Troubleshooting Complex Performance Issues series - an example of Oracle SEG$ internal segment contention due to some direct path insert activity.
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder
Tanel Poder's Oracle Performance and Troubleshooting Scripts & Tools presentation initially presented at Hotsos Symposium Training Day back in year 2010
All on Adaptive and Extended Cursor SharingMohamed Houri
Dig into details on how Oracle has implemented Adaptive Cursor Sharing feature to make using bind variables and having optimal plans at each query execution possible
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
From Tanel Poder's Troubleshooting Complex Performance Issues series - an example of Oracle SEG$ internal segment contention due to some direct path insert activity.
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder
Tanel Poder's Oracle Performance and Troubleshooting Scripts & Tools presentation initially presented at Hotsos Symposium Training Day back in year 2010
This is a recording of my Advanced Oracle Troubleshooting seminar preparation session - where I showed how I set up my command line environment and some of the main performance scripts I use!
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
Tanel Poder has been involved in a number of Exadata migration projects since its introduction, mostly in the area of performance ensurance, troubleshooting and capacity planning.
These slides, originally presented at UKOUG in 2010, cover some of the most interesting challenges, surprises and lessons learnt from planning and executing large Oracle database migrations to Exadata v2 platform.
This material is not just repeating the marketing material or Oracle's official whitepapers.
This is a high level presentation I delivered at BIWA Summit. It's just some high level thoughts related to today's NoSQL and Hadoop SQL engines (not deeply technical).
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.
This is a recording of my Advanced Oracle Troubleshooting seminar preparation session - where I showed how I set up my command line environment and some of the main performance scripts I use!
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
Tanel Poder has been involved in a number of Exadata migration projects since its introduction, mostly in the area of performance ensurance, troubleshooting and capacity planning.
These slides, originally presented at UKOUG in 2010, cover some of the most interesting challenges, surprises and lessons learnt from planning and executing large Oracle database migrations to Exadata v2 platform.
This material is not just repeating the marketing material or Oracle's official whitepapers.
This is a high level presentation I delivered at BIWA Summit. It's just some high level thoughts related to today's NoSQL and Hadoop SQL engines (not deeply technical).
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.
Modern Linux Performance Tools for Application TroubleshootingTanel Poder
Modern Linux Performance Tools for Application Troubleshooting.
Mostly demos and focused on application/process troubleshooting, not systemwide summaries.
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
This talk is about showing the complexity in building a data pipeline in Hadoop, starting with the technology aspect, and the correlating to the skillsets of current Hadoop adopters.
In this webinar, we'll:
-Examine the key drivers and use cases for High Availability, performance and scalability for Apache Hadoop.
-Walk through an overview of reference architecture for a Non-Stop Hadoop implementation.
-Show how you can get started with Non-Stop Hadoop with the Hortonworks Data Platform.
This presentation talks about the different ways of getting SQL Monitoring reports, reading them correctly, common issues with SQL Monitoring reports - and plenty of Oracle 12c-specific improvements!
Large-scale social media analysis with Hadoopjakehofman
In this tutorial we will discuss the use of Hadoop for processing large-scale social data sets. We will first cover the map/reduce paradigm in general and subsequently discuss the particulars of Hadoop's implementation. We will then present several use cases for Hadoop in analyzing example data sets, examining the design and implementation of various algorithms with an emphasis on social network analysis.
SimD is a safe, productive and efficient C++ API for the OMG DDS. This presentation introduces the basic concepts of SimD and guides you through the steps required to write your first SimD application.
SIMD extensions have been used since early 70’s in vector programming. They are mainly CPU specific and not Oracle specific. Nevertheless, they are used by Oracle and especially by In-Memory option to perform operations against Oracle In-Memory Compression Units. Oracle exploits SIMD extensions to perform data parallelism but without concurrency.
In this presentation, the data structure (vectors), CPU Registers, SIMD instructions and how they are used outside Oracle will be presented. As a consequence, we will see how the combination of both (data structure and instructions) can significantly reduce the number of operations. During this part, code samples written in C will be presented and executed to demonstrate it.
Then, a focus will be made on SIMD instructions inside Oracle and how it uses them: usage of specific libraries and Oracle kernel components that should use SIMD instructions.
The GNU Linux debugger helps us in this way to detect internal procedures usage that are based on SIMD instructions. It helps us to establish the link between SIMD instructions and Oracle operations like filters and aggregates. Then, it will be showed how SIMD instructions are involved in performance improvement.
Finally, a conclusion will be made on “why Oracle In-Memory performances would be improved in future versions”.
This presentation will help user to understand, by a bottom up analysis, how non-specific programming techniques are used by Oracle to improve performance with In-Memory Option and how the column format can benefit from this improvement.
SIMD machines — machines capable of evaluating the same instruction on several elements of data in parallel — are nowadays commonplace and diverse, be it in supercomputers, desktop computers or even mobile ones. Numerous tools and libraries can make use of that technology to speed up their computations, yet it could be argued that there is no library that provides a satisfying minimalistic, high-level and platform-agnostic interface for the C++ developer.
Slide deck for talk at IETF#92 (Dallas, March 2015) at the IETF Light-Weight Implementation Guidance (lwig) working group about the performance of cryptographic algorithms on ARM processors.
A 16-bit microprocessor I designed during my final semester (2005) of my Bachelor of Technology program. The microprocessor circuitry design was coded in VHDL and then configured in a Xilinx XC9572 PC84 CPLD kit. Most of the design, the architecture and the instruction set were taken from Computer System Architecture (3rd ed.) by M. Morris Mano. See https://github.com/susam/mano-cpu for VHDL source code and other related files.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/programming-vision-pipelines-on-amds-ai-engines-a-presentation-from-amd/
Kristof Denolf, Principal Engineer, and Bader Alam, Director of Software Engineering, both of AMD, present the “Programming Vision Pipelines on AMD’s AI Engines” tutorial at the May 2022 Embedded Vision Summit.
AMD’s latest generation of Adaptive Compute Acceleration Platforms (ACAP), Versal AI Core and Versal AI Edge, include an array of powerful AI Engines alongside other computation components, such as programmable logic and ARM cores. This array of AI Engines has high computational capability to address the workloads of diverse applications, including automotive solutions.
This presentation introduces the properties and capabilities of these AI Engines for image, video and vision processing. Denolf and Alam begin with a top-down look at how video data makes its way to the AI Engines. Then they delve into a detailed discussion of the compute properties of the VLIW vector architecture of the AI Engines and illustrate how it efficiently executes vision processing kernels. Next, they introduce the Vitis Vision Library and give an overview of its data movement and kernel processing capabilities. They conclude by showing how AMD’s Vitis tools support building a vision pipeline and analyzing its performance.
HKG15-300: Art's Quick Compiler: An unofficial overviewLinaro
HKG15-300: Art's Quick Compiler: An unofficial overview
---------------------------------------------------
Speaker: Matteo Franchin
Date: February 11, 2015
---------------------------------------------------
★ Session Summary ★
One of the important technical novelties introduced with the recent release of Android Lollipop is the replacement of Dalvik, the VM which was used to execute the bytecode produced from Java apps, with ART, a new Android Run-Time. One interesting aspect in this upgrade is that the use of Just-In-Time compilation was abandoned in favour of Ahead-Of-Time compilation. This delivers better performance [1], also leaving a good margin for future improvements. ART was designed to support multiple compilers. The compiler that shipped with Android Lollipop is called the “Quick Compiler”. This is simple, fast, and is derived from Dalvik’s JIT compiler. In 2014 our team at ARM worked in collaboration with Google to extend ART and its Quick Compiler to add support for 64-bit and for the A64 instruction set. These efforts culminated with the recent release of the Nexus 9 tablet, the first 64-bit Android product to hit the market. Despite Google’s intention of replacing the Quick Compiler with the so-called “Optimizing Compiler”, the job for the the Quick Compiler is not yet over. Indeed, the Quick Compiler will remain the only usable compiler in Android Lollipop. Therefore, all competing parties in the Android ecosystem have a huge interest in investigating and improving this component, which will very likely be one of the battlegrounds in the Android benchmark wars of 2015. This talk aims to give an unofficial overview of ART’s Quick compiler. It will first focus on the internal organisation of the compiler, adopting the point of view of a developer who is interested in understanding its limitations and strengths. The talk will then move to exploring the output produced by the compiler, discussing possible strategies for improving the generated code, while keeping in mind that this component may have a limited life-span, and that any long-term work would be better directed towards the Optimizing Compiler. [1] The ART runtime, B. Carlstrom, A. Ghuloum, and I. Rogers, Google I/O 2014,https://www.youtube.com/watch?v=EBlTzQsUoOw
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250804
Video: https://www.youtube.com/watch?v=iho-e7EPHk0
Etherpad: N/A
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
this is a complete summer training report on embedded sys_AVR. It aslo includes a project and its coding and other topics which are learnt in training.
Similar to Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2) (20)
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
2. ABOUT ME
Oracle Consultant since 2001
Former developer (C, Java, perl, PL/SQL)
Blogger since 2004
http://laurent.leturgez.free.fr (In french and discontinued)
http://laurent-leturgez.com
Twitter : @lleturgez
OCM 11g
3. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
4. Caveats
Most of the topics are from
My own researches
My past life as a developer
Some of the topics are about internals, so:
Analysis and conclusion may be incomplete
Future versions of Oracle may change the features
Tests have been done with Oracle 12.1.0.2, Oracle
Enterprise Linux 7.1, VMWare Fusion 7 (And VirtualBox)
5. Before we start …
Some fundamentals (from Dennis Yurichev’s book)
CPU register : […]The easiest way to understand a register is to
think of it as an untyped temporary variable. Imagine if you were
working with high-level PL1 and could only use eight 32-bit (or 64-
bit) variables. Yet a lot can be done using just these!
Instruction : A primitive CPU command. The simplest examples
include: moving data between registers, working with memory and
arithmetic primitives. As a rule, each CPU has its own instruction set
architecture (ISA).
Assembly language : Mnemonic code and some extensions like
macros which are intended to make a programmer’s life easier.
http://beginners.re/Reverse_Engineering_for_Beginners-en.pdf
6. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
7. SIMD instructions … outside
Oracle 12c
SIMD stands for Single Instruction Multiple Data
Process multiple data
In one CPU instruction
Based on
Specific registers
Specific CPU instructions and sets of instructions
Not Oracle specific
CPU Architecture specific
Intel
IBM (Altivec)
Sparc (VIS)
This presentation is mainly about Intel architecture
8. SIMD instructions … outside
Oracle 12c
What is a SIMD register ?
It’s a CPU register
Wider than traditional registers (RDI, RSI, R8, R9 etc.)
128 up to 512 bits wide
Contains many data
9. SIMD instructions … outside
Oracle 12c
Scalar operation
an array of 4 integers {1,2,3,4}
add 1 to each value
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
1
1
2
2
Reg1
Reg2
Reg3
CPU
RAM
In
Out
2 3 41
4
1
5
3 4 52
…/…
LOAD ADD SAVE
4 LOAD
4 ADD
4 SAVE
10. SIMD instructions … outside
Oracle 12c
SIMD operation
an array of 4 integers {1,2,3,4}
add 1 to each value
SIMD Reg1
CPU
RAM
In
Out
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
SIMD Reg1
CPU
RAM
In
Out
2 3 41
3 4 52
2 3 41
1 1 11
3 4 52
SIMD Reg2
SIMD Reg3
LOAD ADD SAVE
11. SIMD instructions … outside
Oracle 12c
Instruction
set
MMX SSE SSE2/SSE3/S
SSE3/SSE4
AVX/AVX2 AVX3 or
AVX512
Register Size 64 Bits 128 bits 128 bits 256 Bits 512 bits
# Registers 8 8 16 16 32
Register Name MM0 to MM7 XMM0 to XMM7 XMM0 to XMM15 YMM0 to YMM15 ZMM0 to ZMM31
Processors Pentium II Pentium III Pentium IV to
Nehalem
Sandy Bridge -
Haswell
Skylake
Other Only four 32 bits
single precision
floating point
numbers
Usage expansion
(two 64 bits
double precision,
four 32 bits
integers and up to
sixteen 8 bits
bytes)
Three operand
instructions (non
destructive) :
A+B=C rather
than A=A+B
Alignements
requirements
relaxed
13. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
14. Will my application use SIMD registers
and instructions ?
It depends on :
Hardware
Consult processors datasheets to see which instruction set extensions
are used (if many)
http://ark.intel.com/#@Processors
Hypervisor
Some (old) hypervisors do not support modern extensions
VirtualBox versions <5.0 don’t support SSE4, AVX and AVX2
Hyper-V on W2008R2-SP1 needs patch for specific processors to
support AVX
15. It depends on the Operating System
AVX (256 bits) is supported from
Linux Kernel >= 2.6.30
Redhat EL5 : 2.6.18
Oracle EL5 w/UEK : 2.6.32
AVX needs xsave kernel parameter
Solaris 10 upd 10 and Solaris 11
Windows 2008 R2 SP1
Will my application use SIMD registers
and instructions ?
16. It depends on the compiler
GCC
> 4.6 for AVX support
Use of specific switches (-msse2, -msse4.1, msse4.2, -mavx,
-mavx2 …)
Intel C/C++ Compiler (ICC)
> 11.1 for AVX Support and > 13.0 for AVX2 support
Use of specific switches (-xsse4.2, -xavx, -xcore-avx2 …)
Beware of optimization switches (-O1,-O2, -O3)
More … disassemble (if you are allowed to )
Registers
Assembler instructions
Will my application use SIMD registers
and instructions ?
17. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
18. Based on a C program
Used CPU: Haswell microarchitecture (Core i7-
4960HQ). AVX/AVX2 enabled
3 tests : No SIMD, SSE4, AVX
Input: one array containing 1Million values.
Goal: Add 1 to each value, each million values
repeated 4k, 8k, 16k and 32k times
CPU Time(s) = f(#rows)
“Quick and Dirty” Sample code available here:
https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v
Raw performance
19. Raw performance
10.35
20.46
42.35
85.64
3.3 6.81
13.73
25.58
1.96 3.51 7.23
15.15
0
10
20
30
40
50
60
70
80
90
4096 M. ROWS 8192 M. ROWS 16384 M. ROWS 32768 M. ROWS
CPUTime(Sec)
RAW Performance (CPU) for SIMD Instructions
NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)
20. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
21. SIMD instructions … inside
Oracle 12c
In Memory Data Structure
In Memory Compression Unit :
IMCU
IMCU is the unit of column store
allocation
Target size is 1M rows
(controlled by _inmemory_imcu_target_rows)
One IMCU can contain more than
one column
Each column in one IMCU is a
column unit (CU)
22. SIMD instructions … inside
Oracle 12c
In memory column store storage indexes
For each column unit, min and max values are maintained in
a storage index
Storage Indexes provide CU pruning
Information about CU available in GV$IM_COL_CU
(Undocumented. See Bug ID 19361690)
IMCU
Pruning
23. SIMD instructions … inside
Oracle 12c
The way your data is sorted matters for best IMCU pruning
24. SIMD instructions … inside
Oracle 12c
SIMD extensions are used with In Memory storage indexes
for efficient filtering
1. IM Storage Indexes do IMCU pruning
2. SIMD instructions apply efficiently filter predicates
IMCU
Pruning
Prod-id
10
10
14
14
10
Filtering
with SIMD
25. SIMD instructions … inside
Oracle 12c
Oracle 12c uses specific libraries for SIMD (and compression)
Located in $ORACLE_HOME/lib
libshpksse4212.so for SSE4.2 extensions
Compiled with ICC v12 with specific xsse4.2 switch
libshpkavx12.so for AVX extensions
Compiled with ICC v12 with specific xavx switch
libshpkavx212.so for AVX2 extensions
Not yet implemented (8 functions implemented)
No ICC avx2 switch used because ICC v12 doesn’t support AVX2
Thanks Tanel Pöder
26. SIMD instructions … inside
Oracle 12c
Oracle SIMD related functions
Located in kdzk kernel module (HPK)
Part of Advanced Compression library (ADVCMP)
Easily tracked with systemtap
27. SIMD instructions … inside
Oracle 12c
How Oracle uses SIMD extensions ?
It depends on many parameters
OS Level : /proc/cpuinfo
AVX and AVX2 support
SSE4 Support only
28. SIMD instructions … inside
Oracle 12c
Which library am I using ?
pmap
AVX support
SSE4 support
29. SIMD instructions … inside
Oracle 12c
Which compiler options have been used ?
Read “comment” section in ELF
Read the corresponding compiler documentation
[oracle@oel7 conf]$ readelf -p .comment $ORACLE_HOME/lib/libshpkavx12.so |
> | egrep -i 'intel|gcc' | egrep 'xavx|mavx’
[ 2c] -?comment:Intel(R) C Intel(R) 64 Compiler XE for applications running on
Intel(R) 64, Version 12.0 Build 20120731
…/…
-DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx
30. SIMD instructions … inside
Oracle 12c
How are SIMD registers used by Oracle ?
GDB
To get the call stack (backtrace)
To set breakpoints on interesting functions
To view register contents (traditional and SIMD)
“Info registers” for traditional registers
“Info all-registers” for all registers (SIMD reg included)
(gdb) print $ymmX.<format>
Format can be v8_float, v4_double, v32_int8, v16_int16, v8_int32,
v4_int64, or v2_int128
31. SIMD instructions … inside
Oracle 12c
In red, register content
has been modified
In blue, the second part of
the SIMD registers (128
bits) is empty
32. SIMD instructions … inside
Oracle 12c
Oracle IM can use AVX or SSE4 extensions for SIMD
operations
When AVX is used
It uses only 128 bits out of 256 bits wide registers
• AVX adds new register-state through the 256-bit wide YMM
register file
• Explicit operating system support is required to properly save
and restore AVX's expanded registers between context
switches
• Without this, only AVX 128-bit is supported
33. SIMD instructions … inside
Oracle 12c
The culprit
Oracle 12.1.0.2 is supported from EL5 onwards
EL5 Redhat Kernel is 2.6.18 and this flag (xsave) is
supported from 2.6.30 kernels
For compatibility reasons, Oracle has to compile
its code on 2.6.18 kernels
34. Agenda
SIMD Instructions, outside Oracle 12c
What is a SIMD instruction ?
Will my application use SIMD ?
Raw Performance
SIMD Instructions, inside Oracle 12c
How SIMD instructions are used inside Oracle 12c
Tracing SIMD in Oracle 12c
35. Tracing SIMD in Oracle 12c
Oradebug has 2 components related to IM
36. Tracing SIMD in Oracle 12c
Interesting components to trace for SIMD
and/or IMCU Pruning are :
IM_optimizer
Gives information about CBO calculation related to
IM
ADVCMP_DECOMP.*
ADVCMP_DECOMP_HPK : SIMD functions
ADVCMP_DECOMP_PCODE : Portable Code
Machine (usually comparison functions and results)
37. Tracing SIMD in Oracle 12c
IM_optimizer
Information available in trace file
IMCU Pruning ratio
CU decompression costing (per IMCU)
Predicate evaluation costing (per row)
Statement has to be parsed to get results
38. Tracing SIMD in Oracle 12c
select prod_id,cust_id,time_id from laurent.s_capa_high where amount_sold=20;
39. Tracing SIMD in Oracle 12c
This information is available in CBO trace file (10053 or SQL_costing
event)
40. Tracing SIMD in Oracle 12c
ADVCMP_DECOMP
ADVCMP_DECOMP_HPK
Information is available in the trace file (for each IMCU
processed)
Used library and function
Number of rows and counting algorithm
Processing rate (comparison and decompression if relevant)
But nothing on the results of the processing
41. Tracing SIMD in Oracle 12c
ADVCMP_DECOMP
ADVCMP_DECOMP_HPK
Gives information about SIMD function usage and filtering (after
IMCU pruning)
Example: inmemory table with NO MEMCOMPRESS or DML
compression
42. Tracing SIMD in Oracle 12c
ADVCMP_DECOMP
ADVCMP_DECOMP_HPK
Example: inmemory compressed table
SIMD are used only in the kdzk_eq_dict functions
43. Tracing SIMD in Oracle 12c
My thoughts about compression/decompression
NO MEMCOMPRESS / COMPRESS FOR DML
kdzk*dynp* functions (ex: kdzk_eq_dynp_16bit, kdzk_le_dynp_32bit
etc.)
FOR QUERY LOW / QUERY HIGH
Dictionary Encoding (LZW ?) : kdzk_*dict* functions (ex:
kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.)
Run Length Encoding: kdzk_burst_rle* functions (ex:
kdzk_burst_rle_8bit, kdzk_burst_rle_16bit …)
Bit packing compression: kdzk*fixed* functions (ex:
kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit …)
44. Tracing SIMD in Oracle 12c
My thoughts about compression/decompression
FOR CAPACITY LOW
FOR QUERY LOW + additional proprietary compression (OZIP)
Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex:
kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.)
FOR CAPACITY HIGH
FOR QUERY HIGH + heavy weigth compression algorithm
Compression/decompression method depends on:
Datatype
Column Compression Unit size
Column contents
AVX adds new register-state through the 256-bit wide YMM register file, so explicit operating system support is required to properly save and restore AVX's expanded registers between context switches; without this, only AVX 128-bit is supported[citation needed].
Actual Size depends on size of row, compression factor
Updated by background process
Triggered by IMC0
W00x : processes that populate IM Column store
Contains list of rowid
Depends on how data are sorted inside the extents because, loading data into IMCU reads table extents sequentially
More than 1400 function implemented in AVX and SSE42 libraries
Xavx (diff mavx) has specific optimization
HPK : High Performance Compression ?
/proc/cpuinfo gives information depending on Hardware, kernel, kernel options, and hypervisor used (if used)
For other OS, use tools that uses CPUID function and read EAX, EBX, ECX and EDX registers
CPUINFO depends on Hardware, Kernel and its options, used hypervisor
ELF : Executable and Linking Format
Decompression costing : columns used in filter predicates + Columns in select
Predicate cost evaluation : /!\ cumulative values
Cost generated by column in the SELECT clause are not reported on the 10053 event trace file. Only the column in the filter predicate