Scaling CPU Experiences: An Introduction to the Entity Component SystemUnity Technologies
Project: https://github.com/unitytechnologies/spaceshooterecs
If you've seen an Entity Component System (ECS) demo and thought, "how did they do that?" then this talk is for you. Unity's ECS (combined with the C# Job System and the Burst Compiler) provides an easy and safe way to unlock the power of modern CPUs. Come see first-hand how to implement the ECS and the C# Job System in a project. You'll leave with a better understanding of the concepts, terminology, application, and massive benefits of these new additions to Unity.
Speakers:
Mike Geig (Global Head of Evangelism Content, Unity)
Julien Fischer (Developer Marketing Manager, Intel)
https://unite.unity.com/2018/berlin
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
ECS (Part 1/3) - Introduction to Data-Oriented DesignPhuong Hoang Vu
ECS (Part 1/3) - Introduction to Data-Oriented Design.
Some small experiments to verify CPU Cache problems, SOA vs AOS and some other problems from OOP approach.
Then basic things about DOD.
Scaling CPU Experiences: An Introduction to the Entity Component SystemUnity Technologies
Project: https://github.com/unitytechnologies/spaceshooterecs
If you've seen an Entity Component System (ECS) demo and thought, "how did they do that?" then this talk is for you. Unity's ECS (combined with the C# Job System and the Burst Compiler) provides an easy and safe way to unlock the power of modern CPUs. Come see first-hand how to implement the ECS and the C# Job System in a project. You'll leave with a better understanding of the concepts, terminology, application, and massive benefits of these new additions to Unity.
Speakers:
Mike Geig (Global Head of Evangelism Content, Unity)
Julien Fischer (Developer Marketing Manager, Intel)
https://unite.unity.com/2018/berlin
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
ECS (Part 1/3) - Introduction to Data-Oriented DesignPhuong Hoang Vu
ECS (Part 1/3) - Introduction to Data-Oriented Design.
Some small experiments to verify CPU Cache problems, SOA vs AOS and some other problems from OOP approach.
Then basic things about DOD.
NHN NEXT 게임 서버 프로그래밍 강의 자료입니다. 최소한의 필요한 이론 내용은 질문 위주로 구성되어 있고 (답은 학생들 개별로 고민해와서 피드백 받는 방식) 해당 내용에 맞는 실습(구현) 과제가 포함되어 있습니다.
참고로, 서버 아키텍처에 관한 과목은 따로 있어서 본 강의에는 포함되어 있지 않습니다.
A technical deep dive into the DX11 rendering in Battlefield 3, the first title to use the new Frostbite 2 Engine. Topics covered include DX11 optimization techniques, efficient deferred shading, high-quality rendering and resource streaming for creating large and highly-detailed dynamic environments on modern PCs.
[IGC 2017] 펄어비스 민경인 - Mmorpg를 위한 voxel 기반 네비게이션 라이브러리 개발기강 민우
펄어비스의 MMORPG, 검은사막에 적용되어있는 AI 네비게이션 기능은 VOXEL 기반으로 자체 개발한 엔진을 이용해 구현되어 있습니다. 기존의 대다수 상용 라이브러리들이 네비 메쉬라고 하는 이동가능한 평면을 표현하는 폴리곤 기반의 데이터를 이용해 길찾기를 수행해주는 것에 비해 근간이 다릅니다. 이 강연에서는 검은사막의 네비게이션 엔진을 구현하고, 서버 / 클라이언트에 적용하면서 얻게된 노하우와 적용된 결과물들을 소개합니다.
Building a World in the Clouds: MMO Architecture on AWS (MBL304) | AWS re:Inv...Amazon Web Services
Can you really build the infrastructure required to bring a massively multiplayer online game (MMO) to life in the cloud? This session discusses the evolution of Red 5 Studios' FireFall—a free-to-play MMO. FireFall runs entirely on the AWS platform and allows players from around the world to play together in the cloud. The session covers some of the design decisions made over the last two years—the things that worked well and not so well. The session also presents some of the solutions Red 5 implemented to ease the transition from dedicated data center hardware to virtual servers in AWS.
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Game engines have long been in the forefront of taking advantage of the ever
increasing parallel compute power of both CPUs and GPUs. This talk is about how the
parallel compute is utilized in practice on multiple platforms today in the Frostbite game
engine and how we think the parallel programming models, hardware and software in
the industry should look like in the next 5 years to help us make the best games possible.
NHN NEXT 게임 서버 프로그래밍 강의 자료입니다. 최소한의 필요한 이론 내용은 질문 위주로 구성되어 있고 (답은 학생들 개별로 고민해와서 피드백 받는 방식) 해당 내용에 맞는 실습(구현) 과제가 포함되어 있습니다.
참고로, 서버 아키텍처에 관한 과목은 따로 있어서 본 강의에는 포함되어 있지 않습니다.
#GDC15 Great Management of Technical LeadsMike Acton
Congratulations! You're a lead. Now what? In general, whatever skills you've demonstrated that got you to this point aren't the same things you'll be doing from here on out (or at least not as much.) This talk is an entry-level description of expectations for any technical gamedev lead. See also: http://gamasutra.com/blogs/MikeActon/20141112/229942/Lead_Quick_Start_Guide.php
NHN NEXT 게임 서버 프로그래밍 강의 자료입니다. 최소한의 필요한 이론 내용은 질문 위주로 구성되어 있고 (답은 학생들 개별로 고민해와서 피드백 받는 방식) 해당 내용에 맞는 실습(구현) 과제가 포함되어 있습니다.
참고로, 서버 아키텍처에 관한 과목은 따로 있어서 본 강의에는 포함되어 있지 않습니다.
A technical deep dive into the DX11 rendering in Battlefield 3, the first title to use the new Frostbite 2 Engine. Topics covered include DX11 optimization techniques, efficient deferred shading, high-quality rendering and resource streaming for creating large and highly-detailed dynamic environments on modern PCs.
[IGC 2017] 펄어비스 민경인 - Mmorpg를 위한 voxel 기반 네비게이션 라이브러리 개발기강 민우
펄어비스의 MMORPG, 검은사막에 적용되어있는 AI 네비게이션 기능은 VOXEL 기반으로 자체 개발한 엔진을 이용해 구현되어 있습니다. 기존의 대다수 상용 라이브러리들이 네비 메쉬라고 하는 이동가능한 평면을 표현하는 폴리곤 기반의 데이터를 이용해 길찾기를 수행해주는 것에 비해 근간이 다릅니다. 이 강연에서는 검은사막의 네비게이션 엔진을 구현하고, 서버 / 클라이언트에 적용하면서 얻게된 노하우와 적용된 결과물들을 소개합니다.
Building a World in the Clouds: MMO Architecture on AWS (MBL304) | AWS re:Inv...Amazon Web Services
Can you really build the infrastructure required to bring a massively multiplayer online game (MMO) to life in the cloud? This session discusses the evolution of Red 5 Studios' FireFall—a free-to-play MMO. FireFall runs entirely on the AWS platform and allows players from around the world to play together in the cloud. The session covers some of the design decisions made over the last two years—the things that worked well and not so well. The session also presents some of the solutions Red 5 implemented to ease the transition from dedicated data center hardware to virtual servers in AWS.
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Game engines have long been in the forefront of taking advantage of the ever
increasing parallel compute power of both CPUs and GPUs. This talk is about how the
parallel compute is utilized in practice on multiple platforms today in the Frostbite game
engine and how we think the parallel programming models, hardware and software in
the industry should look like in the next 5 years to help us make the best games possible.
NHN NEXT 게임 서버 프로그래밍 강의 자료입니다. 최소한의 필요한 이론 내용은 질문 위주로 구성되어 있고 (답은 학생들 개별로 고민해와서 피드백 받는 방식) 해당 내용에 맞는 실습(구현) 과제가 포함되어 있습니다.
참고로, 서버 아키텍처에 관한 과목은 따로 있어서 본 강의에는 포함되어 있지 않습니다.
#GDC15 Great Management of Technical LeadsMike Acton
Congratulations! You're a lead. Now what? In general, whatever skills you've demonstrated that got you to this point aren't the same things you'll be doing from here on out (or at least not as much.) This talk is an entry-level description of expectations for any technical gamedev lead. See also: http://gamasutra.com/blogs/MikeActon/20141112/229942/Lead_Quick_Start_Guide.php
http://dev.paxsite.com/schedule/panel/rebooting-the-insomniac-tools-new-tech-for-a-new-ip-and-a-new-generation
The explosion of web-based enterprise applications, fully-web enabled mobile devices (like the iPhone and iPad) and browser-based gaming sent a clear signal that the era of native applications living in a sandbox was over. Not just for games, but also for the tools we use to develop games. We felt this was an opportunity to make a strategic change that would position us much better and net us valuable experience for the inevitable changes to operating systems, development tools and user expectations. This is the story of our move to AAA game development tools as a webapp. We will share our most significant choices and the costs of those choices. We will suggest alternatives where we feel that better results can still be achieved. And we will share details of the technical architecture of the new tools suite.
Scene Graphs & Component Based Game EnginesBryan Duggan
A presentation I made at the Fermented Poly meetup in Dublin about Scene Graphs & Component Based Game Engines. Lots of examples from my own game engine BGE - where almost everything is a component. Get the code and the course notes here: https://github.com/skooter500/BGE
Jonathan Garrett (one of our Lead Engine Programmers) recently introduced our new realtime cinematics system to the team that a bunch of Core engine programmers have been working hard on for a while now. If you've been following our previous notes on our tools, you'll know that our tools are webapps (except for the 3D scene view which is natively running our game/engine). We definitely took a lot of the lessons we've learned in creating previous HTML5/Javascript tools which interact with the game and our servers and applied them to this new tool. Cinematics in particular are much more complicated than anyone ever thinks to develop, and as such this is easily one of our most sophisticated tools. While the presentation slides here are short on implementation details, I hope you'll still enjoy this inside look into one of out web-based tools built for large-scale productions.
From HTML5 Developers Conference (22 May 2014)
Making (console) games in the browser
Insomniac Games has been making AAA console games with an in-house suite of browser-based tools for the last few years. This talk will be a whirlwind review of the architectural choices, lessons learned and and other tidbits picked up along the way in supporting multiple large production teams and titles. Talk will include notes on server design, mixing native code and javascript, asset databases and real boots-on-the-ground production trade-offs.
Deploying a Low-Latency Multiplayer Game Globally: Loadout Amazon Web Services
This is a deep-dive straight into the guts of running a low-latency multiplayer game, such as a first-person shooter, on a global scale. We dive into architectures that enable you to split apart your back-end APIs from your game servers, and Auto Scale them independently. See how to run game servers in multiple AWS regions such as China and Frankfurt, and integrate them with your central game stack. We’ll even demo this in action, using AWS CloudFormation and Chef to deploy Unreal Engine game servers.
During the continuous mORMot refactoring, some core part of the framework was rewritten. In this session, we propose a journey to a refactoring of a single loop. It will take us from a naïve but working approach, to a 10 times faster Pascal rewrite, and then introduce how SSE2 and AVX2 assembly could boost the process even further – to reach more than 30 times improvement! No previous knowledge of assembly is needed: we will try to introduce how modern CPUs work, and will have some fun with algorithms and SIMD parallelism.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Presentació a càrrec de Cristian Gomollon, tècnic d'Aplicacions al CSUC, duta a terme a la "5a Jornada de formació sobre l'ús del servei de càlcul" celebrada el 16 de març de 2021 en format virtual.
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementEmery Berger
This talk answers an age-old question: is garbage collection faster/slower/the same speed as malloc/free? We introduce oracular memory management, an approach that lets us measure unaltered Java programs as if they used malloc and free. The result: a good GC can match the performance of a good allocator, but it takes 5X more space. If physical memory is tight, however, conventional garbage collectors suffer an order-of-magnitude performance penalty.
How to make fewer errors at the stage of code writing. Part N3.PVS-Studio
This is the third article where I will tell you about a couple of new programming methods that can help you make your code simpler and safer. You may read the previous two posts here [1] and here [2]. This time we will take samples from the Qt project.
Доклад рассказывает об устройстве и опыте применения инструментов динамического тестирования C/C++ программ — AddressSanitizer, ThreadSanitizer и MemorySanitizer. Инструменты находят такие ошибки, как использование памяти после освобождения, обращения за границы массивов и объектов, гонки в многопоточных программах и использования неинициализированной памяти.
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
Presentació a càrrec de Cristian
Gomollon, tècnic d'Aplicacions al CSUC, duta a terme a la "4a Jornada de formació sobre l'ús del servei de càlcul" celebrada el 17 de març de 2021 en format virtual.
These days fast code needs to operate in harmony with its environment. At the deepest level this means working well with hardware: RAM, disks and SSDs. A unifying theme is treating memory access patterns in a uniform and predictable way that is sympathetic to the underlying hardware. For example writing to and reading from RAM and Hard Disks can be significantly sped up by operating sequentially on the device, rather than randomly accessing the data. In this talk we’ll cover why access patterns are important, what kind of speed gain you can get and how you can write simple high level code which works well with these kind of patterns.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
9. To make best use of a tool (e.g. compiler)
• Solve the part of the problem the tool can’t help with.
• Prepare the area that the tool can help with.
• Solve details missed by using the tool as intended.
12. Approach #1
• Estimate resources available
• Triage based on cost and value estimates
• Collect data
• Adapt as cost and/or value predictions change
• AKA Engineering
13. Approach #2
• Don’t worry about resources
• Desperately scramble to fix when running out.
• Generously described as irrational
• Desperate scramble != “optimization”
34. Bottleneck: most insufficient for the demand
• Q: Is everything always all about memory access / cache?
• A: No. It’s about whatever the most scarce resources are.
• E.g. Disk access seeks; GPU draw counts; etc.
53. 12 bytes x count(32) = 384 = 64 x 6
4 bytes x count(32) = 128 = 64 x 2
54. 12 bytes x count(32) = 384 = 64 x 6
4 bytes x count(32) = 128 = 64 x 2
(6/32) = ~5.33 loop/cache line
55. 12 bytes x count(32) = 384 = 64 x 6
4 bytes x count(32) = 128 = 64 x 2
Sqrt + math = ~40 x 5.33 = 213.33 cycles/cache line
(6/32) = ~5.33 loop/cache line
56. 12 bytes x count(32) = 384 = 64 x 6
4 bytes x count(32) = 128 = 64 x 2
Sqrt + math = ~40 x 5.33 = 213.33 cycles/cache line
(6/32) = ~5.33 loop/cache line
+ streaming prefetch bonus
57. 12 bytes x count(32) = 384 = 64 x 6
4 bytes x count(32) = 128 = 64 x 2
Sqrt + math = ~40 x 5.33 = 213.33 cycles/cache line
(6/32) = ~5.33 loop/cache line
+ streaming prefetch bonus
Using cache line to capacity* =
Est. 10x speedup
* Used. Still not necessarily as
efficiently as possible
60. “The high-level design of the code generator”
• http://llvm.org/docs/CodeGenerator.html
• Instruction Selection
• Map data/code to instructions that exist
• Scheduling and Formation
• Give opportunity to schedule / estimate against data access latency
• SSA-based Machine Code Optimizations
• Manual SSA form
• Register Allocation
• Copy to/from locals in register size/types
• Prolog/Epilog Code Insertion
• Late Machine Code Optimizations
• Pay careful attention to inlining
• Code Emission
• Verify asm output regularly
61. “Compilers are good at applying
mediocre optimizations
hundreds of times.”
- @deplinenoise
62. Three easy tips to help compiler
• #1 Analyze one value
• #2 Hoist all loop-invariant reads and branches
• #3 Remove redundant transform redundancy
63. #1 Analyze one value
• Find any one value (or implicit value) you’re interested in, anywhere.
• Printf it out over time. (Helps if you tag it so you can find it.)
• Grep your tty and paste into excel. (Sometimes a quick and dirty
script is useful to convert to a more complex piece of data.)
• See what you see. You’re bound to be surprised by something.
66. Approximately 0% of the sound
source components have a
source. (i.e. not playing yet)
67. Approximately 0% of the sound
source components have a
source. (i.e. not playing yet)
Switch to evaluating from
source->component not
component->source
20K vs. 200 cache lines
68. #2 Hoist all loop-invariant reads and branches
• Don’t re-read member values or re-call functions when you already
have the data.
• Hoist all loop-invariant reads and branches. Even super-obvious ones
that should already be in registers (member fields especially.)
83. Ghost reads and writes
Don’t re-read member values or re-call functions when
you already have the data.
Hoist all loop-invariant reads and branches. Even super-
obvious ones that should already be in registers.
85. :)
A bit of unnecessary branching, but more-or-less equivalent.
86. #3 Remove redundant transform redundancy
• Often, especially with small functions, there is not enough context to
know when and under what conditions a transform will happen.
• It’s easy to fall into the over-generalization trap
• It’s only real cases, with real data, we’re concerned with.
92. wchar_t* -> char* -> wchar_t*
it takes the char* we created and converts it
right back to a whar*
93.
94.
95. What’s happening?
• We convert from char* to wchar* to store it in m_DeleteFiles
• …So we can convert it from wchar* to char* to give to FileDelete
• …So we can convert it from char* to whcar* to give it to Windows
DeleteFile.