(Check my blog @ http://www.marioalmeida.eu/ )
In this presentation I present the performance metrics and results of running the parsec benchmark with the raytrace application on Upc's boada server
Are you a Java developer wondering what it means to have your application running in the cloud. This session will provide a peek into how the JVM is adapting to running in the cloud and what Java developers need to be aware to ensure they get the most of running in the cloud.
The session will pick an example spring application and tune it stage by stage at the end of which we have an application that is fully optimized and takes advantage of every aspect of the running in a cloud
Following the previous talk on kernel locks it is now the time to talk about lock-free data structures. We will cover the implementation and usage of the RCU mechanism in the kernel.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
https://github.com/veltzer
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Nicolas Navet
M. Grenier, L. Havet, N. Navet, "Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost", Proc. of the 4th European Congress Embedded Real Time Software (ERTS 2008), Toulouse, France, January 29 - February 1, 2008.
Are you a Java developer wondering what it means to have your application running in the cloud. This session will provide a peek into how the JVM is adapting to running in the cloud and what Java developers need to be aware to ensure they get the most of running in the cloud.
The session will pick an example spring application and tune it stage by stage at the end of which we have an application that is fully optimized and takes advantage of every aspect of the running in a cloud
Following the previous talk on kernel locks it is now the time to talk about lock-free data structures. We will cover the implementation and usage of the RCU mechanism in the kernel.
Speaker:
Mark Veltzer - CTO of Hinbit and a senior instructor at John Bryce. Mark is also a member of the Free Source Foundation and contributes to many free projects.
https://github.com/veltzer
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Nicolas Navet
M. Grenier, L. Havet, N. Navet, "Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost", Proc. of the 4th European Congress Embedded Real Time Software (ERTS 2008), Toulouse, France, January 29 - February 1, 2008.
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry codeAnne Nicolas
I have always wondered what happens when we enter the kernel from userspace: what preparations does the hardware meet when the userspace to kernel space switch instructions are executed and back, and what does the kernel do when it executes a system call. There are also a bunch of things it does before it executes the actual syscall so I try to look at those too.
This talk is an attempt to demystify some of the aspects of the cryptic x86 entry code in arch/x86/entry/ written in assembly and how does that all fit with software-visible architecture of x86, what hardware features are being used and how.
With the hope to get more people excited about this funky piece of the kernel and maybe have the same fun we’re having.
Borislav Petkov, SUSE
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...The Linux Foundation
While datacenters are increasingly adopting VMs to provide elastic cloud services, they still rely on traditional TCP for congestion control. In this talk, I will first show that VM scheduling delays can heavily contaminate RTTs sensed by VM senders, preventing TCP from correctly learning the physical network condition. Focusing on the incast problem, which is commonly seen in large-scale distributed data processing such as MapReduce and web search, I find that the solutions that have been developed for *physical* clusters fall short in a Xen *virtual* cluster. Second, I will provide a concrete understanding of the problem, and reveal that the situations that when the sending VM is preempted versus when the receiving VM is preempted, are different. Third, I will introduce my recent attempts on paravirtualizing TCP to overcome the negative effect caused by VM scheduling delays.
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Anne Nicolas
The PREEMPT_RT patch turns Linux into a hard Real-Time designed operating system. But it takes more than just a kernel to make sure you can meet all your requirements. This talk explains all aspects of the system that is being used for a mission critical project that must be considered. Creating a Real-Time environment is difficult and there is no simple solution to make sure that your system is capable to fulfill its needs. One must be vigilant with all aspects of the system to make sure there are no surprises. This talk will discuss most of the “gotchas” that come with putting together a Real-Time system.
You don’t need to be a developer to enjoy this talk. If you are curious to know how your computer is an unpredictable mess you should definitely come to this talk.
Steven Rostedt - Red Hat
Slides used for an internal training. This explains how to generate Flame Graphs using Java Flight Recorder dumps. There is also an example to use Linux "perf_events" to generate a Java Mixed-Mode Flame Graph.
In this talk Jiří Pírko discusses the design and evolution of the VLAN implementation in Linux, the challenges and pitfalls as well as hardware acceleration and alternative implementations.
Jiří Pírko is a major contributor to kernel networking and the creator of libteam for link aggregation.
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry codeAnne Nicolas
I have always wondered what happens when we enter the kernel from userspace: what preparations does the hardware meet when the userspace to kernel space switch instructions are executed and back, and what does the kernel do when it executes a system call. There are also a bunch of things it does before it executes the actual syscall so I try to look at those too.
This talk is an attempt to demystify some of the aspects of the cryptic x86 entry code in arch/x86/entry/ written in assembly and how does that all fit with software-visible architecture of x86, what hardware features are being used and how.
With the hope to get more people excited about this funky piece of the kernel and maybe have the same fun we’re having.
Borislav Petkov, SUSE
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...The Linux Foundation
While datacenters are increasingly adopting VMs to provide elastic cloud services, they still rely on traditional TCP for congestion control. In this talk, I will first show that VM scheduling delays can heavily contaminate RTTs sensed by VM senders, preventing TCP from correctly learning the physical network condition. Focusing on the incast problem, which is commonly seen in large-scale distributed data processing such as MapReduce and web search, I find that the solutions that have been developed for *physical* clusters fall short in a Xen *virtual* cluster. Second, I will provide a concrete understanding of the problem, and reveal that the situations that when the sending VM is preempted versus when the receiving VM is preempted, are different. Third, I will introduce my recent attempts on paravirtualizing TCP to overcome the negative effect caused by VM scheduling delays.
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Anne Nicolas
The PREEMPT_RT patch turns Linux into a hard Real-Time designed operating system. But it takes more than just a kernel to make sure you can meet all your requirements. This talk explains all aspects of the system that is being used for a mission critical project that must be considered. Creating a Real-Time environment is difficult and there is no simple solution to make sure that your system is capable to fulfill its needs. One must be vigilant with all aspects of the system to make sure there are no surprises. This talk will discuss most of the “gotchas” that come with putting together a Real-Time system.
You don’t need to be a developer to enjoy this talk. If you are curious to know how your computer is an unpredictable mess you should definitely come to this talk.
Steven Rostedt - Red Hat
Slides used for an internal training. This explains how to generate Flame Graphs using Java Flight Recorder dumps. There is also an example to use Linux "perf_events" to generate a Java Mixed-Mode Flame Graph.
In this talk Jiří Pírko discusses the design and evolution of the VLAN implementation in Linux, the challenges and pitfalls as well as hardware acceleration and alternative implementations.
Jiří Pírko is a major contributor to kernel networking and the creator of libteam for link aggregation.
High Availability of Services in Wide-Area Shared Computing NetworksMário Almeida
(Check my blog @ http://www.marioalmeida.eu/ )
Highly available distributed systems have been widely used and have proven to be resistant to a wide range of faults. Although these kind of services are easy to access, they require an investment that developers might not always be willing to make. We present an overview of Wide-Area shared computing networks as well as methods to provide high availability of services in such networks. We make some references to highly available systems that are being used and studied at the moment this paper was written (2012).
Overview of the high-availability of YARN. How to make it highly available and the possibility of using NDB MySQL Cluster in order to store the state of the Resource Manager and the Application Masters without having to depend on different architectures such as Zookeeper and HDFS.
Missing references, will add soon!!
During one of my personal projects I decided to study the internals of Android and the potential of altering the Dalvik VM (e.g. Xposed framework and Cydia) and application behaviour. Not going into detail about runtime hooking of constructors and classes like these two tools provide, I also explored the possibility of reverse engineering and modifying existing applications.
In the web you can find multiple tutorials on Android reverse engineering of applications but not many that do it with real applications that are often subject to obfuscation or with complex execution flows. So in order to learn I decided to pick a common application such as Skype and do the following:
decompile it
study contents and completely remove some functionality (e.g. ads)
change some resources (not described in presentation bellow)
recompile, sign and install.
Used tools include :
apktool – for (de)compiling android applications
jarsigner – for signing android applications
xposed – for intercepting runtime execution flow (will make public in future)
The following presentation describes the steps taken in order to completely remove the ads from skype. This includes any computation or data plan usage the ads consume. Please note the disclaimer of the presentation as this information is for educational purposes only.
Check my website : www.marioalmeida.eu
cachegrand: A Take on High Performance CachingScyllaDB
cachegrand is what happens when you throw in a mix a SIMD-accelerated hashtable — capable of performing parallel GET operations without locks or busy-wait loops (e.g. atomic operations) — with fibers, io_uring, your own I/O library, your own memory allocator, and an in-memory & on-disk time series database!
Written in C, built from scratch, natively modular - currently working on Redis compatibility — it's a platform that can deliver very high QPS with low latencies for caching and data streaming with the door open to supporting business logic in Rust & WebAssembly down the line.
This session will focus on developing techniques and OS components used highlighting how they can provide an extra boost to your platforms, no matter the programming language.
1) Design and Implementation of Multicore Processors
2) Coherence and Consistency
3) Power and Temperature
4) Interconnects
5) Multicore Caches
6) Security
7) Real world examples
Seven years ago at LCA, Van Jacobsen introduced the concept of net channels but since then the concept of user mode networking has not hit the mainstream. There are several different user mode networking environments: Intel DPDK, BSD netmap, and Solarflare OpenOnload. Each of these provides higher performance than standard Linux kernel networking; but also creates new problems. This talk will explore the issues created by user space networking including performance, internal architecture, security and licensing.
We demonstrated how at Criteo we have introduced on our Mesos clusters:
* network isolation between our containers
* a network bandwidth custom resource patching all our frameworks (marathon and aurora).
This talk has been presented at MesosCon18 in SF.
Ariel Waizel discusses the Data Plane Development Kit (DPDK), an API for developing fast packet processing code in user space.
* Who needs this library? Why bypass the kernel?
* How does it work?
* How good is it? What are the benchmarks?
* Pros and cons
Ariel worked on kernel development at the IDF, Ben Gurion University, and several companies. He is interested in networking, security, machine learning, and basically everything except UI development. Currently a Solution Architect at ConteXtream (an HPE company), which specializes in SDN solutions for the telecom industry.
In this deck from the Argonne Training Program on Extreme-Scale Computing 2019, Scott Parker from Argonne presents: Theta and the Future of Accelerator Programming.
Designed in collaboration with Intel and Cray, Theta is a 6.92-petaflops (Linpack) system based on the second-generation Intel Xeon Phi processor and Cray’s high-performance computing software stack. Capable of nearly 10 quadrillion calculations per second, Theta will enable researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.
Theta’s unique architectural features represent a new and exciting era in simulation science capabilities,” said ALCF Director of Science Katherine Riley. “These same capabilities will also support data-driven and machine-learning problems, which are increasingly becoming significant drivers of large-scale scientific computing.”
Watch the video: https://wp.me/p3RLHQ-lkl
Learn more: https://www.alcf.anl.gov/news/argonnes-theta-supercomputer-goes-online
and
https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
Performance is a thing that you can never have too much of. But performance is a nebulous concept in Hadoop. Unlike databases, there is no equivalent in Hadoop to TPC, and different use cases experience performance differently. This talk will discuss advances on how Hadoop performance is measured and will also talk about recent and future advances in performance in different areas of the Hadoop stack.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
8. Paraver
● Detailed quantitative analysis of a program
performance.
● Concurrent comparative analysis of several
traces.
● Support for mixed message passing and
shared memory.
● Building of derived metrics.
6
9. Configuration (1/4)
Boada server:
● Dual CPU Six Core with Hyperthreading.
● Kills applications after a few minutes.
● 24 GB of RAM.
Boada server:
● Used cpulimit to limit the cpu usage up to four cores.
7
10. Configuration (2/4)
Installed and/or configured:
● Parsec 2.1 with raytrace package only.
● Extrae 2.2.1.
● Paraver 4.3.0 (in my laptop).
● CpuLimit
● Minor configurations on .bashrc.
● Multiple scripts to clean, build and run.
8
15. Raytrace
Code
For every pixel in the image
calculate trajectory of ray striking pixel
find closest intersection point of ray with scene
geometry
calculate contribution of all lights at intersection point
recursively trace specularly reflected ray
end for
12
16. Raytrace
Inputs
● simsmall - 1 million polygons (480x270)
● simmedium - 1 million poly (960x540)
● simlarge - 1 million poly (1920x1080)
● native - 10 million poly (1920x1080)
13
21. Raytrace
Cache and instructions
High number of cache misses Very low number of cache misses
There were no significative
diferences of IPC between
threads.
18
22. Raytrace
Execution time (1/3)
These are average times from
multiple executions of the parallel
code only and without extrae
overhead.
There was a high average
deviation of 0.3 seconds in the
experiments.
Bigger inputs were more accurate.
19
23. Raytrace
Execution time (2/3)
There was a smaller average
deviation of 0.03 seconds.
With 64 threads it runs almost
three times faster!
20
24. Raytrace
Execution time (3/3)
There was a even smaller average
deviation of 0.02 seconds.
With 64 threads it runs almost
three times faster!
21
25. Raytrace
Configuration comparison
In the case of the limited
configuration, although
perfomance doesn't seem
to degrade, the execution
time seems to stabilize for
more than 8 threads.
22
28. Conclusions (1/3)
● The system seemed to perform worse for a
number of threads multiple of the total
number of physical cores.
● The program has a good load balancing.
● Fine-granular parallelism.
24
29. Conclusions (2/3)
● Although it wasn't possible to verify,
increasing the input should cause higher
cache misses, because of the big working
sets that won't fit on the memory.
● Memory bandwidth should be the main issue
for good speedups.
● Boada killed almost all the native input
executions. 25
30. Conclusions (3/3)
● Paraver simplifies the process of analyzing
an application performance.
● Better knowledge of the systems
architecture would be needed in order
further analyse the performance of the
application.
26