The iPhone has a surprisingly powerful engine under that shiny hood when it comes to floating-point computations. This is something that surprises a lot of programmers because by default, things can slow down a lot whenever any floating point numbers are involved. This session will explain the secrets to unlocking maximum performance for floating point calculations, from the mysteries of Thumb mode, to harnessing the full power of the forgotten vector floating point unit. Stay away from this session if he thought of reading or even (gasp!) writing assembly code scares you.
Cranking Floating Point Performance To 11 On The iPhoneNoel Llopis
This document discusses optimizing floating point performance on ARM CPUs. It notes that while ARM CPUs do not support floating point math natively, the VFP unit can be used to perform floating point operations. Turning off thumb mode is recommended as it eliminates overhead from switching in and out of thumb mode for floating point instructions. Reading ARM assembly is important for high performance programming to understand how to best utilize the VFP unit. A vfpmath library exists that has already implemented common floating point functions in optimized ARM assembly.
Claudia Doppioslash - Time Travel for game development with ElmCodemotion
Elm is a purely Functional Reactive Programming language, which happens to have the best current implementation of a Time Travelling Debugger (see Bret Victor's Inventing on Principle [https://vimeo.com/36579366] from 12:25) and is ideally suited for games and UIs. We'll see how FRP, a strong yet simple(r than Haskell) type system and the interactive programming workflow make developing a small game much more pleasant compared to the mainstream game engines experience.
A quick tutorial on what debuggers are and how to use them. We present a debugging example using GDB. At the end of this tutorial, you will be able to work your way through a crash and analyze the cause of the error responsible for the crash.
Slides from my workshop at Hack.LU 2010 in Luxembourg. This workshop introduced the basic concepts of Return Oriented Programming with some hands-on exercises.
This document discusses using Arduino with Ruby. It provides an overview of Arduino hardware and software, introduces RAD (Ruby Arduino Development) for writing Ruby code to control Arduino boards, and presents several example projects that combine Arduino and Ruby including Ruby on Bells, Barduino, and a Flying Robot. Code examples are provided for blinking an LED, fading an LED, reading an analog sensor, and using servos from Ruby. Additional sensors and shields discussed include Sharp IR sensors, ultrasonic rangefinders, and WiFi shields.
This document discusses the history of extensibility in Perl, from early techniques using import subroutines and prototypes, to modern approaches like Devel::Declare, the keyword API, and Moops. Moops provides an easy and extensible way to define new syntax using Keyword::Simple, and was created to improve on earlier modules like MooseX::Declare by using a simpler design focused on extensibility. The document concludes by showing how Moops can be used to define a custom "setup" module that injects imports and extends the syntax, providing a cleaner way to share commonly used functions and roles.
Cranking Floating Point Performance To 11 On The iPhoneNoel Llopis
This document discusses optimizing floating point performance on ARM CPUs. It notes that while ARM CPUs do not support floating point math natively, the VFP unit can be used to perform floating point operations. Turning off thumb mode is recommended as it eliminates overhead from switching in and out of thumb mode for floating point instructions. Reading ARM assembly is important for high performance programming to understand how to best utilize the VFP unit. A vfpmath library exists that has already implemented common floating point functions in optimized ARM assembly.
Claudia Doppioslash - Time Travel for game development with ElmCodemotion
Elm is a purely Functional Reactive Programming language, which happens to have the best current implementation of a Time Travelling Debugger (see Bret Victor's Inventing on Principle [https://vimeo.com/36579366] from 12:25) and is ideally suited for games and UIs. We'll see how FRP, a strong yet simple(r than Haskell) type system and the interactive programming workflow make developing a small game much more pleasant compared to the mainstream game engines experience.
A quick tutorial on what debuggers are and how to use them. We present a debugging example using GDB. At the end of this tutorial, you will be able to work your way through a crash and analyze the cause of the error responsible for the crash.
Slides from my workshop at Hack.LU 2010 in Luxembourg. This workshop introduced the basic concepts of Return Oriented Programming with some hands-on exercises.
This document discusses using Arduino with Ruby. It provides an overview of Arduino hardware and software, introduces RAD (Ruby Arduino Development) for writing Ruby code to control Arduino boards, and presents several example projects that combine Arduino and Ruby including Ruby on Bells, Barduino, and a Flying Robot. Code examples are provided for blinking an LED, fading an LED, reading an analog sensor, and using servos from Ruby. Additional sensors and shields discussed include Sharp IR sensors, ultrasonic rangefinders, and WiFi shields.
This document discusses the history of extensibility in Perl, from early techniques using import subroutines and prototypes, to modern approaches like Devel::Declare, the keyword API, and Moops. Moops provides an easy and extensible way to define new syntax using Keyword::Simple, and was created to improve on earlier modules like MooseX::Declare by using a simpler design focused on extensibility. The document concludes by showing how Moops can be used to define a custom "setup" module that injects imports and extends the syntax, providing a cleaner way to share commonly used functions and roles.
The document discusses the advantages of 64-bit ARMv8-A architecture for Android. It describes how Android Lollipop provides support for both 32-bit and 64-bit applications. Native and ART applications can see performance gains by taking advantage of the ARMv8-A architecture's modern instruction set and use of more registers. The document encourages developers to explore 64-bit development and provides additional resources.
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorAneesh Raveendran
The document discusses the architecture and implementation of the ARM Cortex-A8 microprocessor. It introduces the Cortex-A8 as ARM's first applications microprocessor that delivers high performance and power efficiency for mobile and consumer applications. Key features include the Thumb-2 instruction set, NEON media processing, TrustZone security, and an integrated L2 cache. The Cortex-A8 achieves further performance gains through a dual-issue pipeline and deeper pipeline than prior ARM processors. It employs a combination of synthesized, structured, and custom implementation techniques to optimize for aggressive power, performance and area targets.
The document discusses the ARM Cortex A15 processor. It was designed by ARM and began production in late 2011 for market in late 2012. Key features include NEON for SIMD operations, a VFPv4 floating point unit, Thumb-2 instruction encoding, and TrustZone security. Applications include smartphones, computing devices, and digital home entertainment systems.
This document discusses moving NEON optimizations to 64-bit ARM architectures. Some key points:
- NEON is an ARM instruction set extension that allows single-instruction multiple data (SIMD) processing. It has more registers and capabilities in AArch64, including double precision floating point.
- Migrating NEON code to AArch64 usually only requires minor changes to assembly code due to compatibility in C/intrinsics code and clearer register mappings. Existing NEON documentation still applies.
- Open source libraries and compilers support NEON optimizations, providing performance boosts such as 3-4x faster video codecs. The Android NDK fully supports 64-bit development.
- Examples show optimized
This document discusses NEON intrinsics and how to use them to optimize code for ARM processors that support SIMD instructions. It provides an overview of NEON, describes the data types and some common instructions, and gives examples of using intrinsics for tasks like color space conversion. Performance tests show intrinsics code can be 5-7 times faster than plain C and on par with hand-written assembly. Guidelines are provided for writing efficient NEON intrinsics code.
Discuss challenges of implementing imaging pipelines on mobile chipsets with ARM Mali T604 GPU and Qualcomm Adreno 3xx GPUs.
Presented at Bay Area multimedia meetup (http://www.meetup.com/Bay-Area-Multimedia-Meetup-Group) on Dec. 19, 2013
The document discusses Qualcomm Snapdragon, a family of mobile system on chips (SoCs) designed by Qualcomm. It describes the evolution of Snapdragon CPUs from Scorpion to Krait and their features. It also discusses the Adreno GPU, Hexagon DSP, and other components integrated into Snapdragon SoCs. The document then provides details about specific Snapdragon families like S4, 800 series, and 810. It also includes information about ARM architecture and its instruction set.
The document describes Qualcomm's Snapdragon 800 processor and its specifications. The Snapdragon 800 uses a 28nm HPm process and includes a quad-core Krait 400 CPU up to 2.3GHz, Adreno 330 GPU, LTE and 3G/2G modem support, support for 4K video playback and 55MP cameras. It has advanced features for power management, audio, graphics, and connectivity.
The document provides details about Qualcomm Snapdragon processors across various generations from S1 to 800 series. It describes the key specifications of each including the semiconductor technology, CPU, GPU, memory support and other features. The later generations include enhancements such as improved CPU and GPU performance, support for higher display resolutions and memory speeds, and newer connectivity standards.
This document discusses lessons learned from using OpenCV for embedded vision applications. It notes that while OpenCV works well out of the box for desktop applications, embedded platforms present additional challenges like different processors, interfaces, and unpredictable performance. It recommends prototyping on desktop for faster development, then optimizing algorithms and porting to embedded hardware. Specific optimizations discussed include using ARMv8 processors, vendor-optimized OpenCV packages, and custom NEON-accelerated functions. An example product from Itseez that runs computer vision algorithms in real-time on ARM using these techniques is also presented.
The document provides an overview of embedded systems and ARM processors. It discusses key aspects of ARM processors including the pipeline, memory management features like cache, TCM, MMU and TLB. It also summarizes the AMBA specification and differences between operating in ARM and Thumb states. The document is intended as lecture material for an embedded systems course covering ARM architecture.
Smartphones architecture is generally different from
common desktop architectures. It is limited by power, size and
cost of manufacturing with the goal to provide the best
experience for users in a minimum cost. Stemming from this
fact, modern micro-processors are designed with an
architecture that has three main components: an application
processor that executes the end user’s applications, a modem
responding to baseband radio activities, and peripheral devices
for interacting with the end user.
Parallelism
Multicores:
The Cortex A7 MPCore processor implements the ARMv7-A
architecture. The Cortex A7 MPCore processor has one to
four processors in a single multi-processor device. The
following figure shows an example configuration with four
processors [3].
In this paper, we are discussing the architecture of the
application processor of Apple iPhone. Specifically, Apple
iPhone uses ARM Cortex generation of processors as their
core. The following sections discusses this architecture in terms
of Instruction Set Architecture, Memory Hierarchy and
Parallelism.
This document discusses cooling load, which is the thermal energy that must be removed from a space to maintain comfort conditions. It outlines various components that contribute to cooling load, including heat gains from enclosure elements, internal loads, and outdoor air. Key terms are defined, such as cooling load temperature difference (CLTD) and cooling load factor (CLF), which are used to account for time delays in radiation and conduction gains. Methods for calculating cooling loads from walls, roofs, glazing, lighting, people and other internal sources are presented.
By Neil Roberts.
GPUs often provide half-float 16-bit registers for floating point calculations. Using these instead of full-precision 32-bit registers can often provide a significant performance benefit, particularly on embedded GPUs. The method used to expose these registers to applications in OpenGLES is that variables can be marked as mediump, meaning that the driver is allowed to use a lower precision for any operations involving these variables. The GLES spec allows for the lower precision to be optional so it is always valid to use a higher precision. Mesa currently implements the spec effectively by just ignoring the precision markers and always using full precision.
This talk will present ongoing work at Igalia to implement a lowering pass to convert mediump operations to 16-bit float operations. The work is targetting the Freedreno driver but the resulting lowering pass may be applicable to other drivers too.
(c) X.Org Developer's Conference (XDC) 2019
October 2-4 - Montréal, Canada
https://xdc2019.x.org/
Know your platform. 7 things every scala developer should know about jvmPawel Szulc
The document discusses the importance for Scala developers to understand the basics of the Java Virtual Machine (JVM) platform that Scala code runs on. It provides examples of Java bytecode produced from simple Scala code snippets to demonstrate how code is executed by the JVM. Key points made include that the JVM is a stack-based virtual machine that compiles source code to bytecode instructions, and that understanding the level below the code helps developers write more efficient, robust and performant code.
Hands-on VeriFast with STM32 microcontroller @ OsakaKiwamu Okabe
The document discusses setting up a development environment for the ChibiOS/RT real-time operating system and VeriFast model checker on Windows and macOS systems. It provides instructions for installing necessary tools like GCC ARM, make, CMake, libUSB, ST-Link, ChibiOS/RT, and VeriFast from sources. It also explains downloading a custom ChibiOS/RT source code that is compatible with VeriFast verification.
This document discusses code and memory optimization techniques for software engineers developing AAA game titles. It begins with an introduction to the speaker and provides an overview of hardware architecture including CPU registers, caches, and memory access times. The bulk of the document focuses on optimizing for data caches through techniques like improving data layout, prefetching, and utilizing cache lines efficiently. It also discusses optimizing branches through removing branches, computing both paths, and splitting data to avoid branches. Resources for further reading are provided.
This document discusses code and memory optimization techniques for software engineers developing AAA game titles. It begins with an introduction to the speaker and provides an overview of hardware architecture including CPU registers, caches, and memory access times. The bulk of the document focuses on optimizing for data caches through techniques like improving data layout, prefetching, and utilizing cache lines efficiently. It also discusses optimizing branches through removing branches, computing both paths, and splitting data to avoid branches. Resources for further reading are provided.
This document summarizes a presentation given at RubyConf 2015 about building self-balancing robots using Ruby and various hardware platforms.
The presentation covered:
1. Using the LEGO Mindstorms EV3 with mruby-ev3rt to build an inverted pendulum robot that balances itself.
2. A DIY approach using a Raspberry Pi, gyroscope sensor, DC motors, and motor driver to implement the same balancing behavior from scratch.
3. The balancing algorithm, which calculates motor power needed to balance based on angle, angular velocity, velocity, and position measurements from the gyroscope.
The document discusses the advantages of 64-bit ARMv8-A architecture for Android. It describes how Android Lollipop provides support for both 32-bit and 64-bit applications. Native and ART applications can see performance gains by taking advantage of the ARMv8-A architecture's modern instruction set and use of more registers. The document encourages developers to explore 64-bit development and provides additional resources.
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorAneesh Raveendran
The document discusses the architecture and implementation of the ARM Cortex-A8 microprocessor. It introduces the Cortex-A8 as ARM's first applications microprocessor that delivers high performance and power efficiency for mobile and consumer applications. Key features include the Thumb-2 instruction set, NEON media processing, TrustZone security, and an integrated L2 cache. The Cortex-A8 achieves further performance gains through a dual-issue pipeline and deeper pipeline than prior ARM processors. It employs a combination of synthesized, structured, and custom implementation techniques to optimize for aggressive power, performance and area targets.
The document discusses the ARM Cortex A15 processor. It was designed by ARM and began production in late 2011 for market in late 2012. Key features include NEON for SIMD operations, a VFPv4 floating point unit, Thumb-2 instruction encoding, and TrustZone security. Applications include smartphones, computing devices, and digital home entertainment systems.
This document discusses moving NEON optimizations to 64-bit ARM architectures. Some key points:
- NEON is an ARM instruction set extension that allows single-instruction multiple data (SIMD) processing. It has more registers and capabilities in AArch64, including double precision floating point.
- Migrating NEON code to AArch64 usually only requires minor changes to assembly code due to compatibility in C/intrinsics code and clearer register mappings. Existing NEON documentation still applies.
- Open source libraries and compilers support NEON optimizations, providing performance boosts such as 3-4x faster video codecs. The Android NDK fully supports 64-bit development.
- Examples show optimized
This document discusses NEON intrinsics and how to use them to optimize code for ARM processors that support SIMD instructions. It provides an overview of NEON, describes the data types and some common instructions, and gives examples of using intrinsics for tasks like color space conversion. Performance tests show intrinsics code can be 5-7 times faster than plain C and on par with hand-written assembly. Guidelines are provided for writing efficient NEON intrinsics code.
Discuss challenges of implementing imaging pipelines on mobile chipsets with ARM Mali T604 GPU and Qualcomm Adreno 3xx GPUs.
Presented at Bay Area multimedia meetup (http://www.meetup.com/Bay-Area-Multimedia-Meetup-Group) on Dec. 19, 2013
The document discusses Qualcomm Snapdragon, a family of mobile system on chips (SoCs) designed by Qualcomm. It describes the evolution of Snapdragon CPUs from Scorpion to Krait and their features. It also discusses the Adreno GPU, Hexagon DSP, and other components integrated into Snapdragon SoCs. The document then provides details about specific Snapdragon families like S4, 800 series, and 810. It also includes information about ARM architecture and its instruction set.
The document describes Qualcomm's Snapdragon 800 processor and its specifications. The Snapdragon 800 uses a 28nm HPm process and includes a quad-core Krait 400 CPU up to 2.3GHz, Adreno 330 GPU, LTE and 3G/2G modem support, support for 4K video playback and 55MP cameras. It has advanced features for power management, audio, graphics, and connectivity.
The document provides details about Qualcomm Snapdragon processors across various generations from S1 to 800 series. It describes the key specifications of each including the semiconductor technology, CPU, GPU, memory support and other features. The later generations include enhancements such as improved CPU and GPU performance, support for higher display resolutions and memory speeds, and newer connectivity standards.
This document discusses lessons learned from using OpenCV for embedded vision applications. It notes that while OpenCV works well out of the box for desktop applications, embedded platforms present additional challenges like different processors, interfaces, and unpredictable performance. It recommends prototyping on desktop for faster development, then optimizing algorithms and porting to embedded hardware. Specific optimizations discussed include using ARMv8 processors, vendor-optimized OpenCV packages, and custom NEON-accelerated functions. An example product from Itseez that runs computer vision algorithms in real-time on ARM using these techniques is also presented.
The document provides an overview of embedded systems and ARM processors. It discusses key aspects of ARM processors including the pipeline, memory management features like cache, TCM, MMU and TLB. It also summarizes the AMBA specification and differences between operating in ARM and Thumb states. The document is intended as lecture material for an embedded systems course covering ARM architecture.
Smartphones architecture is generally different from
common desktop architectures. It is limited by power, size and
cost of manufacturing with the goal to provide the best
experience for users in a minimum cost. Stemming from this
fact, modern micro-processors are designed with an
architecture that has three main components: an application
processor that executes the end user’s applications, a modem
responding to baseband radio activities, and peripheral devices
for interacting with the end user.
Parallelism
Multicores:
The Cortex A7 MPCore processor implements the ARMv7-A
architecture. The Cortex A7 MPCore processor has one to
four processors in a single multi-processor device. The
following figure shows an example configuration with four
processors [3].
In this paper, we are discussing the architecture of the
application processor of Apple iPhone. Specifically, Apple
iPhone uses ARM Cortex generation of processors as their
core. The following sections discusses this architecture in terms
of Instruction Set Architecture, Memory Hierarchy and
Parallelism.
This document discusses cooling load, which is the thermal energy that must be removed from a space to maintain comfort conditions. It outlines various components that contribute to cooling load, including heat gains from enclosure elements, internal loads, and outdoor air. Key terms are defined, such as cooling load temperature difference (CLTD) and cooling load factor (CLF), which are used to account for time delays in radiation and conduction gains. Methods for calculating cooling loads from walls, roofs, glazing, lighting, people and other internal sources are presented.
By Neil Roberts.
GPUs often provide half-float 16-bit registers for floating point calculations. Using these instead of full-precision 32-bit registers can often provide a significant performance benefit, particularly on embedded GPUs. The method used to expose these registers to applications in OpenGLES is that variables can be marked as mediump, meaning that the driver is allowed to use a lower precision for any operations involving these variables. The GLES spec allows for the lower precision to be optional so it is always valid to use a higher precision. Mesa currently implements the spec effectively by just ignoring the precision markers and always using full precision.
This talk will present ongoing work at Igalia to implement a lowering pass to convert mediump operations to 16-bit float operations. The work is targetting the Freedreno driver but the resulting lowering pass may be applicable to other drivers too.
(c) X.Org Developer's Conference (XDC) 2019
October 2-4 - Montréal, Canada
https://xdc2019.x.org/
Know your platform. 7 things every scala developer should know about jvmPawel Szulc
The document discusses the importance for Scala developers to understand the basics of the Java Virtual Machine (JVM) platform that Scala code runs on. It provides examples of Java bytecode produced from simple Scala code snippets to demonstrate how code is executed by the JVM. Key points made include that the JVM is a stack-based virtual machine that compiles source code to bytecode instructions, and that understanding the level below the code helps developers write more efficient, robust and performant code.
Hands-on VeriFast with STM32 microcontroller @ OsakaKiwamu Okabe
The document discusses setting up a development environment for the ChibiOS/RT real-time operating system and VeriFast model checker on Windows and macOS systems. It provides instructions for installing necessary tools like GCC ARM, make, CMake, libUSB, ST-Link, ChibiOS/RT, and VeriFast from sources. It also explains downloading a custom ChibiOS/RT source code that is compatible with VeriFast verification.
This document discusses code and memory optimization techniques for software engineers developing AAA game titles. It begins with an introduction to the speaker and provides an overview of hardware architecture including CPU registers, caches, and memory access times. The bulk of the document focuses on optimizing for data caches through techniques like improving data layout, prefetching, and utilizing cache lines efficiently. It also discusses optimizing branches through removing branches, computing both paths, and splitting data to avoid branches. Resources for further reading are provided.
This document discusses code and memory optimization techniques for software engineers developing AAA game titles. It begins with an introduction to the speaker and provides an overview of hardware architecture including CPU registers, caches, and memory access times. The bulk of the document focuses on optimizing for data caches through techniques like improving data layout, prefetching, and utilizing cache lines efficiently. It also discusses optimizing branches through removing branches, computing both paths, and splitting data to avoid branches. Resources for further reading are provided.
This document summarizes a presentation given at RubyConf 2015 about building self-balancing robots using Ruby and various hardware platforms.
The presentation covered:
1. Using the LEGO Mindstorms EV3 with mruby-ev3rt to build an inverted pendulum robot that balances itself.
2. A DIY approach using a Raspberry Pi, gyroscope sensor, DC motors, and motor driver to implement the same balancing behavior from scratch.
3. The balancing algorithm, which calculates motor power needed to balance based on angle, angular velocity, velocity, and position measurements from the gyroscope.
This document provides an overview of memory management concepts in Java and C++, including pointers, the heap, stack, activation records, classes and objects. It discusses how objects are allocated in memory in both languages, with objects in Java being referenced by pointers on the stack and allocated on the heap, while in C++ objects can be allocated on the stack or heap. The document also covers issues like memory leaks in C++ if objects are not deleted from the heap, and garbage collection handling object deletion in Java. Methods and calling conventions are compared between the two languages.
This document provides an overview of Windows user-mode debugging concepts like processes, threads, stack frames, and the WinDbg debugging tool. It discusses how to set up WinDbg and analyze crashes through examples like examining stack frames, debugging a simple crash, and commands commonly used in WinDbg. The document concludes with demonstrating how to analyze an IMA service crash using a memory dump.
Hands-on VeriFast with STM32 microcontroller @ NagoyaKiwamu Okabe
This document describes setting up a development environment for working with the ChibiOS/RT real-time operating system and STM32 microcontrollers using the VeriFast verification tool on Windows or macOS systems. It provides instructions for installing necessary software packages like Git, GCC, CMake and VeriFast as well as downloading customized ChibiOS/RT source code for building sample applications and verifying them using VeriFast.
The document discusses various features of the Vim text editor, including modes (normal, insert, visual), text objects, syntax highlighting, encoding, key mappings, tab pages, and folds. It provides examples of motions and operations in normal mode, editing text in insert mode, selecting regions in visual mode, and syntax definitions. It also covers setting the encoding, defining common key mappings, using tab pages, and folding code with different fold methods.
The document discusses various features of the Vim text editor, including modes (normal, insert, visual), text objects, syntax highlighting, encoding, key mappings, tab pages, and folds. It provides examples of motions and operations in normal mode, editing text in insert mode, selecting regions in visual mode, and syntax definitions. It also covers setting the encoding, defining common key mappings, using tab pages, and folding code with different fold methods.
Winnyp is an anonymous P2P filesharing software based on Winny. It uses its own encryption key generation algorithm that is more complex than Winny's algorithm, making it more difficult to analyze. The report details Winnyp's internal workings, including how it initializes and patches itself, generates encryption keys through multiple algorithms, and specifies the version of connected nodes by using different encryption keys. It also describes how Winnyp sends packets with dummy data and receives packets to communicate with both Winny and other Winnyp nodes.
This document provides an overview of CSS and JavaScript concepts. It discusses CSS transitions, transforms, grid properties, and using media queries with CSS grid. For JavaScript, it covers data types, operators, strings, arrays, objects, functions, and loops. It also provides examples of transform properties, grid column/row definitions, spanning, min-max properties, and template areas in CSS grid.
Binary Obfuscation from the Top Down: Obfuscation Executables without Writing...frank2
Binary obfuscation is a mysterious ritual employed by malware authors and software vendors alike that no one really seems to talk about. It's almost like a secret society. Interestingly, you don't have to write a program to obfuscate the binary-- you can also write high-level code that obfuscates at compile-time, rather than afterward.
Go Go Gadget! - An Intro to Return Oriented Programming (ROP)Miguel Arroyo
ROP (return-oriented programming) is a technique that allows executing malicious code on systems with non-executable stacks by chaining short instruction sequences ("gadgets") already present in memory. The document provides an overview of ROP, including its origins as a generalization of ret2libc attacks. It describes how ROP chains gadgets by controlling the instruction pointer to execute desired sequences ending in return instructions. Finally, it walks through a simple ROP exploit on x86 as a demonstration.
Node has revolutionized modern runtimes. Their async by default strategy boasts 3x the throughput of Java. And yet, the language runs 5x slower than C++ (when JS is interpreted).
This talk is an advanced intro into the world of Node where we take a closer look under the hood. What's the event loop? Why are there multiple compilers for JS in Node/V8? How many threads are actually used in Node and for what purpose? We'll answer these questions and more as we go over libuv, v8, the node core library, npm, and more.
If you're developing with Node, want to start, or are just curious about how it works, please check it out!
RubyConf Portugal 2014 - Why ruby must go!Gautam Rege
The document discusses the Go programming language and how it differs from Ruby. It provides examples of Go code demonstrating type declarations, embedded types, exported variables and functions, and variable redeclaration. It also discusses some concepts in Go like interfaces, channels, and concurrency that are different from Ruby. The document suggests that Go teaches programmers awareness about variables, types, and errors that can improve Ruby code.
Vim Script allows for programming Vim's interface through scripting. It discusses variables, functions, conditional statements, loops, built-in functions, autocommands, commands, and the runtime directory structure for plugins. The document provides an overview of Vim Script programming with examples.
Similar to Cranking Floating Point Performance Up To 11 (20)
-cocos2d is one of the most popular open source projects for the iphone
- over 100 games shipped, with a number of them in the top 10
- This introductory course will get you up to speed on:
*basic architecture of the engine
*using Sprites and their more efficient counterpart, AtlasSprites
*user interaction (dealing with touch and multitouch events)
*particle system
*screen transitions
*actions
OpenGL ES 1.1 is the 3D graphics API used by the iPhone and while it is extremely powerful it can often be very intimidating to the beginner. One of the main issues is that while there is a great deal of documentation and tutorials for OpenGL like the “Red Book” and other sources online there seem to be very few available resources for Open GL ES. This session will introduce the concepts of developing with OpenGL ES 1.1 and demonstrate them via sample code.
User Input in a multi-touch, accelerometer, location aware world.John Wilker
The iPhone is the first device since Nintendo's WiiMote that is actually changing the way players play our games. This session will discuss what it truly means to have an accelerometer driven, multi-touch capable, location aware device for players to play with.
Gone are the days of multi-submenu driven selection and they have been replaced with gesture-based context aware touches that emphasize a natural way to interact with games.
While the focus of this session will be player input as it relates to games, the underlying concepts and approaches should be applicable to all manner of iPhone applications.
Physics Solutions for Innovative Game DesignJohn Wilker
Give your iPhone games a dynamic real-world feel by integrating a physics engine. This session will give an overview of the current physics engines available for iPhone development, discussing the pros and cons of each. We will also discuss how to decide if a physics engine is right for your project or you are better off with custom code. We'll dive into some real world examples with Ragdoll Blaster which uses the Open Dynamics Engine and talk about optimization, debugging and other tips and tricks.
Getting Oriented with MapKit: Everything you need to get started with the new...John Wilker
New in 3.0 Map Kit works with the Google Mobile Maps Service and features panning and zooming, custom annotations, current location and geocoding. This new framework opens up many new possibilities programs not previously possible. Come learn everything you need to know to get started as well as some tips and tricks to speed your programming along.
Getting Started with iPhone Game DevelopmentJohn Wilker
Learn the basic concepts and code architecture behind casual mobile games. We'll walk you through a demo game that uses OpenGL ES and you can keep the source! Amanda and David work for the two top iPhone game studios (Zynga and ngmoco:) - learn from the best!
Want more sales outside of the USA, but don't know where to start? I will show you the ins and outs of each step that must be taken and give guidelines on what you can do to be sure your apps feel local to everyone around the world, including text, images and the interface itself. You will get step by step instructions on how to internationalize any application. Also learn about the language limitations in App Stores around the world and a nice way to find the right translators for your apps.
Optimizing Data Caching for iPhone Application ResponsivenessJohn Wilker
Users of native iPhone applications, even those pulling data from "the cloud" expect a snappy experience. The "Death By A Thousand Papercuts" of a slow UI is possibly the quickest way to disappoint. Follow along as we apply some embedded systems caching tricks to optimize the user experience within example applications to while balancing the trade-offs as a result of caching data.
iPhone applications can often benefit by talking to a web service to synchronize data or share information with a community. Ruby on Rails, with its RESTful conventions, is an ideal backend for iPhone applications. In this session you'll learn how to use ObjectiveResource in an iPhone application to interact with a RESTful web service implemented in Rails. This session isn't about how to build web applications that are served up on the iPhone. It's about how to build iPhone applications with a native look and feel that happen to talk to Rails applications under the hood. The upshot is a user experience that transcends the device.
Integrating Push Notifications in your iPhone application with iLimeJohn Wilker
Learn about Apple’s Push Notification Service, introduced in iPhone OS 3.0, and how you can quickly and easily implement push in your app using the most cost-effective push API on the market. Topics will include getting started with iLime, overview of the iLime API, writing your first iLime application, and integrating your existing server software to use iLime. Code examples will be given for iPhone and server-side integration.
This will be a session to introduce the Manic Gaming Network. We have designed a multiplayer gaming platform which gives developers an easy way to incorporate Peer 2 Peer gaming into their app.
Will cover the following:
* today’s problems with developing a multiplayer solution, and solutions available “out of the box”
* walkthrough of Manic’s services available to the community of gamers and developers
* quick introduction to an API we’re releasing for our service
* game demo will be shown
* review of some sample code to help developers get started
Using Concurrency To Improve ResponsivenessJohn Wilker
Adding concurrency to your iPhone application allows your application to become more responsive to user input and usability. This session will explore the use of NSOperation and NSOperationQueue to add concurrency to iPhone applications through discussion and examples.
Want to squeeze every last bit of performance out of your apps? I will show you how to let go of using Interface Builder to create better performing, more optimized, and leaner apps. I'll walk you through why it's better, how to create and move projects off of IB, building your UI in code, and how to gain a better understanding of how your code works from the ground up.
A cursory look at the Mobile WebKit platform and the benefits of having Javascript at ones disposal. Broad categories covered will include AJAX in the mobile environment, CSS transitions, and iPhone-specific features. Specifically, I will demonstrate how to build a mobile application with a custom, native-looking, interface which uses dynamic data.
Jon liang stepped in to pinch hit on two sessions that the presenters had to miss. OpenGL and Acceleromter. Hit it out of the park. Non Game App Dev Track. 360|iDev San Jose 09
The document compares AppDelegate and NSNotificationCenter in iOS development. AppDelegate serves as the root controller of an application and handles events like startup and shutdown. Delegates allow objects to manipulate each other through callbacks. Notifications allow asynchronous messaging between any objects through NSNotificationCenter by posting and observing notifications. The document provides examples and pros and cons of each approach.
The document provides an overview of SQLitePersistentObjects (SQLPO), an object-relational mapping (ORM) tool for SQLite databases on iPhone. It discusses how to define persistent objects that map to database tables, and perform common CRUD operations like saving, finding, and deleting objects. SQLPO handles mapping data types to columns and storing relationships and collections. It also supports custom queries, indices, and other features to improve performance and flexibility.
The document discusses converting Flash content to native iPhone applications using Barefoot Software's b.Tween technology. It describes b.Tween as a platform that allows Flash applications to be ported to the iPhone while accessing all of its features, without needing to be run through a plugin. The conversion process involves dissecting Flash movies, converting resources and ActionScript, and exporting framed animations. B.Tween uses a dictionary-based system to map Flash display objects and assets to corresponding IDs on the iPhone.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Cranking Floating Point Performance Up To 11
1. Cranking Floating Point
Performance Up To 11
Noel Llopis
Snappy Touch
http://twitter.com/snappytouch
noel@snappytouch.com
http://gamesfromwithin.com
10. Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
11. Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
12. Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
13. Floating point numbers
• Representation of rational numbers
• 1.2345, -0.8374, 2.0000, 14388439.34, etc
• Following IEEE 754 format
• Single precision: 32 bits
• Double precision: 64 bits
17. Why floating point
performance?
• Most games use floating point numbers for
most of their calculations
18. Why floating point
performance?
• Most games use floating point numbers for
most of their calculations
• Positions, velocities, physics, etc, etc.
19. Why floating point
performance?
• Most games use floating point numbers for
most of their calculations
• Positions, velocities, physics, etc, etc.
• Maybe not so much for regular apps
39. Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
40. Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer
values at a reduced range/resolution.
41. Fixed-point arithmetic
• Sometimes integer arithmetic doesn’t cut it
• You need to represent rational numbers
• Can use a fixed-point library.
• Performs rational arithmetic with integer
values at a reduced range/resolution.
• Not so great...
56. Thumb Mode
• CPU has a special thumb
mode.
• Less memory, maybe better
performance.
57. Thumb Mode
• CPU has a special thumb
mode.
• Less memory, maybe better
performance.
• No floating point support.
58. Thumb Mode
• CPU has a special thumb
mode.
• Less memory, maybe better
performance.
• No floating point support.
• Every time there’s an fp
operation, it switches out of
Thumb, does the fp operation,
and switches back on.
64. Thumb Mode
• Turning off Thumb mode increased
performance in Flower Garden by over 2x
65. Thumb Mode
• Turning off Thumb mode increased
performance in Flower Garden by over 2x
• Heavy usage of floating point operations
though
66. Thumb Mode
• Turning off Thumb mode increased
performance in Flower Garden by over 2x
• Heavy usage of floating point operations
though
• Most games will probably benefit from
turning it off (especially 3D games)
80. ARM assembly
• Reading assembly is a very important skill
for high-performance programming
81. ARM assembly
• Reading assembly is a very important skill
for high-performance programming
• Writing is more specialized. Most people
don’t need to.
106. vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
107. vfpmath library
• Already done a lot of work for you
• http://code.google.com/p/vfpmathlibrary
• Vector/matrix math
• Might not be exactly what you need, but it’s
a great starting point
109. Assembly in gcc
• Only use it when targeting the device
#include <TargetConditionals.h>
#if (TARGET_IPHONE_SIMULATOR == 0) && (TARGET_OS_IPHONE == 1)
#define USE_VFP
#endif
117. Assembly in gcc
int src = 19;
int dest = 0;
asm volatile (
"add r10, %1, #42nt"
"add %0, r10, #33nt"
: "=r" (dest)
: "r" (src)
: "r10"
);
118. Assembly in gcc
int src = 19;
int dest = 0;
asm volatile (
"add r10, %1, #42nt"
"add %0, r10, #33nt"
: "=r" (dest)
: "r" (src)
: "r10"
);
Clobber register list
are registers used by
the asm block
119. Assembly in gcc
int src = 19; volatile prevents “optimizations”
int dest = 0;
asm volatile (
"add r10, %1, #42nt"
"add %0, r10, #33nt"
: "=r" (dest)
: "r" (src)
: "r10"
);
Clobber register list
are registers used by
the asm block