The document discusses Automatic Feedback-Directed Optimization (AutoFDO) which uses profile feedback to optimize programs at compile time. It provides an example of using the perf tool to profile a bubble sort program written in C, and then using the Autofdo tools to process the profile data and compile the program with profile-guided optimization flags to improve performance. AutoFDO has been deployed in various projects like CPython, Firefox, Google datacenters, Chrome, and Clearlinux to improve performance by 5-10+% on average. Additional resources on AutoFDO are provided.
Kernel Recipes 2015: Representing device-tree peripherals in ACPIAnne Nicolas
Platforms using ACPI firmware are becoming increasingly interesting to embedded developers. This presentation will demonstrate the new features in the ACPI 5.1 specification which make it possible for ACPI to transparently represent devices using existing device-tree bindings, and for Linux to use existing device drivers which should automatically work for both ACPI and device-tree.
David Woodhouse, Intel
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
Building a DSL with GraalVM (VoxxedDays Luxembourg)Maarten Mulders
GraalVM is a virtual machine that can run many languages on top of the Java Virtual Machine. It comes with support for JavaScript, Ruby, Python… But what if you're building a DSL, or your language is not listed? Fear not!
In this session we'll discover what it takes to run another language in GraalVM. Using GraalVM, we don't only get a fast runtime, but we'll also get great tool support. With Brainfuck as an example, we'll see how we can run guest languages inside Java applications. It might not bring us profit, but at least it will bring some fun.
Kernel Recipes 2015: Representing device-tree peripherals in ACPIAnne Nicolas
Platforms using ACPI firmware are becoming increasingly interesting to embedded developers. This presentation will demonstrate the new features in the ACPI 5.1 specification which make it possible for ACPI to transparently represent devices using existing device-tree bindings, and for Linux to use existing device drivers which should automatically work for both ACPI and device-tree.
David Woodhouse, Intel
Linux kernel tracing superpowers in the cloudAndrea Righi
The Linux 4.x series introduced a new powerful engine of programmable tracing (BPF) that allows to actually look inside the kernel at runtime. This talk will show you how to exploit this engine in order to debug problems or identify performance bottlenecks in a complex environment like a cloud. This talk will cover the latest Linux superpowers that allow to see what is happening “under the hood” of the Linux kernel at runtime. I will explain how to exploit these “superpowers” to measure and trace complex events at runtime in a cloud environment. For example, we will see how we can measure latency distribution of filesystem I/O, details of storage device operations, like individual block I/O request timeouts, or TCP buffer allocations, investigating stack traces of certain events, identify memory leaks, performance bottlenecks and a whole lot more.
Building a DSL with GraalVM (VoxxedDays Luxembourg)Maarten Mulders
GraalVM is a virtual machine that can run many languages on top of the Java Virtual Machine. It comes with support for JavaScript, Ruby, Python… But what if you're building a DSL, or your language is not listed? Fear not!
In this session we'll discover what it takes to run another language in GraalVM. Using GraalVM, we don't only get a fast runtime, but we'll also get great tool support. With Brainfuck as an example, we'll see how we can run guest languages inside Java applications. It might not bring us profit, but at least it will bring some fun.
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
Speaker: Andrzej Dyjak
Language: English
In recent years security industry started to grow fond of Apple’s iOS and OS X platforms. This talk will cover one of XNU's flagship debugging utilities: DTrace, a dynamic tracing framework for troubleshooting kernel and application problems on production systems in real time. It will be shown how it can be used in order to ease various tasks within the realm of dynamic binary analysis and beyond.
CONFidence: http://confidence.org.pl/
Being functional in PHP (PHPDay Italy 2016)David de Boer
Functional programming, though far from new, has gained much traction recently. Functional programming characteristics have started to appear in the PHP world, too. Microframeworks such as Silex and Slim, middleware architectures such as Stack and even standards such as PSR-7 rely on concepts such as lambdas, referential transparency and immutability, all of which come from functional programming. I’ll give you a crash course in Erlang, a pragmatic functional language to make you feel familiar with the functional paradigm. By comparing code samples between Erlang and PHP, you’ll find out how you can employ functional programming in your PHP applications where appropriate. You’ll see that functional programming is nothing to be scared of. On the contrary, understanding its concepts broadens your programming horizon and provides you with valuable solutions to your problems.
Capture the Flag (CTF) are information security challenges. They are fun, but they also provide a opportunity to practise for real-world security challenges.
In this talk we present the concept of CTF. We focus on some tools used by our team, which can also be used to solve real-world problems.
Functional programming, though far from new, has gained much traction recently. Functional programming characteristics have started to appear in the PHP world, too. Microframeworks including Silex and Slim, middleware architectures (Stack) and even standards (PSR-7) rely on concepts such as lambdas, referential transparency and immutability, all of which come from functional programming.
I’ll give you a crash course in Erlang, a pragmatic functional language to make you feel familiar with the functional paradigm. By comparing code samples between Erlang and PHP, you’ll find out how and why you should employ functional programming in your PHP applications. You’ll see that functional programming is nothing to be scared of. On the contrary, understanding its concepts broadens your programming horizon and provides you with valuable solutions to your problems.
By Samuel Iglesias.
Video: https://www.youtube.com/watch?v=orzbxNdwJ04
OpenGL is an API for rendering 2D and 3D graphics now managed by the non-profit technology consortium Khronos Group. Implementors are free to provide their own implementation of the API. For example, in GNU/Linux systems NVIDIA provides its own proprietary version while other manufacturers like Intel are using Mesa, the most popular open source OpenGL implementation.
Because of this implementation freedom, ensuring compliance with the specification is important. Khronos provides their own OpenGL conformance test suite but there are several unofficial open source alternatives.
This talk will explain some of these open source OpenGL conformance test suites and give an introduction about how to use them, including sharing tips between the speaker and the audience.
How to test OpenGL drivers using Free Software (FOSDEM 2015)Igalia
By Samuel Iglesias.
OpenGL is an API for rendering 2D and 3D graphics now managed by the non-profit technology consortium Khronos Group. Implementors are free to provide their own implementation of the API. For example, in GNU/Linux systems NVIDIA provides its own proprietary version while other manufacturers like Intel are using Mesa, the most popular open source OpenGL implementation.
Because of this implementation freedom, ensuring compliance with the specification is important. Khronos provides their own OpenGL conformance test suite but there are several unofficial open source alternatives.
This talk will explain some of these open source OpenGL conformance test suites and give an introduction about how to use them, including sharing tips between the speaker and the audience.
(c) 2015 FOSDEM VZW
CC BY 2.0 BE
https://archive.fosdem.org/2015/
GraalVM is a virtual machine that can run many languages on top of the Java Virtual Machine. It comes with support for JavaScript, Ruby, Python… But what if you're building a DSL, or your language is not listed? Fear not!
In this session we'll discover what it takes to run another language in GraalVM. Using GraalVM, we don't only get a fast runtime, but we'll also get great tool support. With Brainfuck as an example, we'll see how we can run guest languages inside Java applications. It might not bring us profit, but at least it will bring some fun.
The why and how of moving to PHP 5.5/5.6Wim Godden
With PHP 5.6 out and many production environments still running 5.2 or 5.3, it's time to paint a clear picture on why everyone should move to 5.5 and 5.6 and how to get code ready for the latest version of PHP. In this talk, we'll look at some handy tools and techniques to ease the migration.
Functional programming, though far from new, has gained much traction recently. Functional programming characteristics have started to appear in the PHP world, too. Microframeworks including Silex and Slim, middleware architectures (Stack) and even standards (PSR-7) rely on concepts such as lambdas, referential transparency and immutability, all of which come from functional programming.
I’ll give you a crash course in Erlang, a pragmatic functional language to make you feel familiar with the functional paradigm. By comparing code samples between Erlang and PHP, you’ll find out how and why you should employ functional programming in your PHP applications. You’ll see that functional programming is nothing to be scared of. On the contrary, understanding its concepts broadens your programming horizon and provides you with valuable solutions to your problems.
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2dXUUTG.
Nathan Taylor provides an introduction to the dynamic analysis research space, suggesting integrating these techniques into various internal tools. Filmed at qconnewyork.com.
Nathan Taylor is a software developer currently employed at Fastly, where he works on making the Web faster through high performance content delivery. Previous gigs have included hacking on low-level systems software such as Java runtimes at Twitter and, prior to that, the Xen virtual machine monitor in grad school.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
2. Only C, No C++ in this presentation:
Rationale
●
Linux inventor Linus Torvalds’ comments on
C++ from last week at the Embedded Linux
Conference 2017 keynote
●
https://youtu.be/NLQZzEvavGs?list=PLbzoR-
pLrL6pISWAq-1cXP4_UZAyRtesk&t=1343
3. Bubble sort
”In computer graphics bubble sort is popular ... in almost-sorted arrays … with just linear
complexity (2n)” - wikipedia
void bubble_sort(u64 *a, int n) {
u64 i, temp, swap_flag = 1;
while (swap_flag) {
swap_flag = 0;
for (i = 1; i < n; i++) {
if (a[i] < a[i - 1]) {
/* swap */
temp = a[i];
a[i] = a[i - 1];
a[i - 1] = temp;
swap_flag = 1;
}
}
}
}
Condition predictable?
Function inline?
Loop unroll?
Minimize branches
4. Common practice
●
gcc -g -O3 -o sort sort.c
●
./sort 30000
What did the compiler do?:
– gcc -S –verbose-asm -o sort.S sort.c
– objdump -d -S sort
– perf record ./sort 3000; perf annotate
8. less common practice:
software based instrumentation
●
gcc -g -O3 -fprofile-generate -o sort sort.c
●
./sort 3000
●
gcc -g -O3 -fprofile-use -o sort sort.c
●
./sort 30000
9. What did the compiler do differently?
│ b6: lea 0x8(%rcx),%rdx
│ for (i = 1; i < n; i++) { // 6
│ cmp %rdx,%r8
│ ↓ je 2e0
│ test %rax,%rax
│ ↓ je 24b
│ cmp $0x1,%rax
│ ↓ je 1a6
│ cmp $0x2,%rax
│ ↓ je 18a
│ cmp $0x3,%rax
│ ↓ je 16e
│ cmp $0x4,%rax
│ ↓ je 152
│ cmp $0x5,%rax
│ ↓ je 136
│ cmp $0x6,%rax
│ ↓ je 11a
│ if (a[i] < a[i - 1]) { // 7
│ mov 0x8(%rcx),%r9
│ mov -0x8(%rdx),%r10
│ cmp %r10,%r9
│ ↓ jae 116
│ a[i] = a[i - 1]; // 9
│ mov %r10,0x8(%rcx)
│ swap_flag = 1; // 11
│ mov $0x1,%edi
│ a[i - 1] = temp; // 10
│ mov %r9,-0x8(%rdx)
│116: add $0x8,%rdx
│ if (a[i] < a[i - 1]) { // 7
│11a: mov (%rdx),%r11
│ mov -0x8(%rdx),%r12
│ cmp %r12,%r11
│ ↓ jae 132
│ a[i] = a[i - 1]; // 9
│ mov %r12,(%rdx)
│ a[i - 1] = temp; // 10
│ mov %r11,-0x8(%rdx)
– It unrolled the inner loop
10. Loop unwinding
●
The goal of loop unwinding is to increase a program's
speed by:
– reducing or eliminating instructions that control the loop, such as
pointer arithmetic and "end of loop" tests on each iteration;
– reducing branch penalties; as well as
– hiding latencies including the delay in reading data from memory.
●
To eliminate this computational overhead, loops can be re-
written as a repeated sequence of similar independent
statements.
– wikipedia
11. software based instrumentation:
deployments
●
Ahem, I don’t really know (need a survey?)
●
Do know git supports building itself like this
– Full profile
– Fast profile
●
Hands up if any of the projects you have worked on!
16. least^WGoogle common practice 2:
AutoFDO via runtime process attachment
gcc -g -O3 -o sort sort.c
./sort 300000 &
~/git/pmu-tools/ocperf.py record -b -e
br_inst_retired.near_taken:pp -p <PID>
kill %1
~/git/autofdo-andikleen/create_gcov -debug_dump -logtostderr
--binary=./sort --profile=perf.data --gcov=./sort.gcov
-gcov_version=1
~/git/autofdo-andikleen/dump_gcov -gcov_version=1 ./sort.gcov
gcc -g -O3 -fauto-profile=sort.gcov -o sort sort.c
17. Deployed
●
Cpython (rumour: 5% off the interpreter loop)
●
Firefox
●
Google datacenters (“over 50% of cycles spent are
optimized with FDO”)
●
Chrome, ChromeOS
●
Clearlinux
●
Github: kevinquinnyo/php7-wp-build-docker: Builds latest
stable php releases in docker container, optimizes the
build for wordpress with GCC AutoFDO and builds …
18. Extra tidbits
●
Coverage files (.gcov, etc.) are CPU arch-
independent: generate once, use x86, Arm, Power
●
AutoFDO supports LLVM (different coverage files)
●
5-10+% improvement consistently observed at
Google, most gain within 3-5-7 iterations with little
sample data
●
6-month old (“stale”) coverage files still good for at
least ½ of the original performance benefit
19. Additional resources
●
Tutorial:
– https://gcc.gnu.org/wiki/AutoFDO/Tutorial
●
Where to get gcov_create:
– https://github.com/google/autofdo
●
Where to get ocperf.py:
– git://github.com/andikleen/pmu-tools.git
●
Dehao Chen’s presentation at GCC Cauldron conf.:
– https://www.youtube.com/watch?v=26SrOC6MXWg
●
Co-worker’s presentation at Embedded Linux conf.:
– https://www.youtube.com/watch?v=S2Q1OJuZoX4
●
Large CERN project experience (5-13% improvement):
– https://indico.cern.ch/event/587970/contributions/2369824/attachments/1374948/2087355/slides.pdf
●
Me:
– kim.phillips@arm.com
20. Excerpt from git’s INSTALL file:
If you're willing to trade off (much) longer build time for a later faster git you
can also do a profile feedback build...
This will run the complete test suite as training workload and then rebuild
git with the generated profile feedback. This results in a git which is a few
percent faster on CPU intensive workloads. This may be a good tradeoff
for distribution packagers.
Alternatively you can run profile feedback only with the git benchmark suite.
This runs significantly faster than the full test suite, but has less coverage...
As a caveat: a profile-optimized build takes a *lot* longer since the git tree
must be built twice, and in order for the profiling measurements to work
properly, ccache must be disabled and the test suite has to be run using
only a single CPU. In addition, the profile feedback build stage currently
generates a lot of additional compiler warnings.