1. Intro to SRE role
2. SRE vs DevOps vs SDE
3. How to prepare for SRE interviews ?
4. What specific skills to acquire for working as a SRE ?
5. How should we start our career as SRE straight out of college?
6. Study materials that can help
Title: Sista: Improving Cog’s JIT performance
Speaker: Clément Béra
Thu, August 21, 9:45am – 10:30am
Video Part1
https://www.youtube.com/watch?v=X4E_FoLysJg
Video Part2
https://www.youtube.com/watch?v=gZOk3qojoVE
Description
Abstract: Although recent improvements of the Cog VM performance made it one of the fastest available Smalltalk virtual machine, the overhead compared to optimized C code remains important. Efficient industrial object oriented virtual machine, such as Javascript V8's engine for Google Chrome and Oracle Java Hotspot can reach on many benchs the performance of optimized C code thanks to adaptive optimizations performed their JIT compilers. The VM becomes then cleverer, and after executing numerous times the same portion of codes, it stops the code execution, looks at what it is doing and recompiles critical portion of codes in code faster to run based on the current environment and previous executions.
Bio: Clément Béra and Eliot Miranda has been working together on Cog's JIT performance for the last year. Clément Béra is a young engineer and has been working in the Pharo team for the past two years. Eliot Miranda is a Smalltalk VM expert who, among others, has implemented Cog's JIT and the Spur Memory Manager for Cog.
Martin Spier and Rex Black presented on leveraging HP Performance Center at Expedia. Rex introduced himself as a performance consultant and Martin as a performance engineer at Expedia. They discussed how performance engineering aims to answer questions about an application's performance. Performance Center was highlighted as a tool that allows sharing resources across teams to improve efficiency and enable distributed testing. Expedia leverages Performance Center's centralized management and reusable test artifacts to test applications early and often across their global, agile teams.
This document summarizes lessons learned from deploying Puppet code globally at high speed. The key changes were moving from SVN to Git for version control, parallelizing deployments using MCollective instead of SSH loops, using MCollective policies instead of sudo, and switching to a pull model over push. These changes allowed deployments to be reduced from 4 minutes to 4 seconds. Environments were used to separate code for different teams and stages. A custom MCollective agent was created to deploy Git branches as Puppet environments. Cron jobs were used to pull updates to environments. Overall this approach improved the speed, consistency, and security of global Puppet deployments.
My talk from the Bay area puppetcamp about deploying puppet code to a global network of puppet masters as quickly as possible.
Covers the design and implementation of the TIM Group (and now Yelp) puppetupdate mcollective agent: https://github.com/Yelp/puppetupdate/
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos
This document discusses parallel and asynchronous programming. It begins by explaining how processors are getting smaller while networks are getting worse, requiring more efficient parallel programming approaches. It then covers different parallel programming models in .NET like data parallelism using PLINQ, task parallelism using TPL, asynchronous programming with async/await, and concurrent collections. It also discusses challenges like cancellation, progress reporting, and synchronization, and how modern .NET addresses these.
Building trust within the organization, first steps towards DevOpsGuido Serra
This document discusses building trust within an organization through a DevOps approach. It introduces the role of a DevOps person to deliver features, mediate between devs and ops, and address non-functional requirements. It outlines steps taken such as listening to stakeholders, gathering requirements, and prioritizing non-functional needs. Tools are proposed for logging, metrics, and testing to provide transparency and shared understanding across teams. Results seen include improved support, proactive issue fixing, and better product performance through data and testing collaboration.
This document outlines an agenda for a coding kata workshop focusing on test-driven development and pair programming. The workshop includes introductions to code katas, test-driven development techniques like the three rules of TDD and the red-green-refactor process. It also covers structuring unit tests, principles of simple design, and roles in pair programming. The evening will involve practicing these concepts through a pizza-building kata exercise across three sessions.
1. Intro to SRE role
2. SRE vs DevOps vs SDE
3. How to prepare for SRE interviews ?
4. What specific skills to acquire for working as a SRE ?
5. How should we start our career as SRE straight out of college?
6. Study materials that can help
Title: Sista: Improving Cog’s JIT performance
Speaker: Clément Béra
Thu, August 21, 9:45am – 10:30am
Video Part1
https://www.youtube.com/watch?v=X4E_FoLysJg
Video Part2
https://www.youtube.com/watch?v=gZOk3qojoVE
Description
Abstract: Although recent improvements of the Cog VM performance made it one of the fastest available Smalltalk virtual machine, the overhead compared to optimized C code remains important. Efficient industrial object oriented virtual machine, such as Javascript V8's engine for Google Chrome and Oracle Java Hotspot can reach on many benchs the performance of optimized C code thanks to adaptive optimizations performed their JIT compilers. The VM becomes then cleverer, and after executing numerous times the same portion of codes, it stops the code execution, looks at what it is doing and recompiles critical portion of codes in code faster to run based on the current environment and previous executions.
Bio: Clément Béra and Eliot Miranda has been working together on Cog's JIT performance for the last year. Clément Béra is a young engineer and has been working in the Pharo team for the past two years. Eliot Miranda is a Smalltalk VM expert who, among others, has implemented Cog's JIT and the Spur Memory Manager for Cog.
Martin Spier and Rex Black presented on leveraging HP Performance Center at Expedia. Rex introduced himself as a performance consultant and Martin as a performance engineer at Expedia. They discussed how performance engineering aims to answer questions about an application's performance. Performance Center was highlighted as a tool that allows sharing resources across teams to improve efficiency and enable distributed testing. Expedia leverages Performance Center's centralized management and reusable test artifacts to test applications early and often across their global, agile teams.
This document summarizes lessons learned from deploying Puppet code globally at high speed. The key changes were moving from SVN to Git for version control, parallelizing deployments using MCollective instead of SSH loops, using MCollective policies instead of sudo, and switching to a pull model over push. These changes allowed deployments to be reduced from 4 minutes to 4 seconds. Environments were used to separate code for different teams and stages. A custom MCollective agent was created to deploy Git branches as Puppet environments. Cron jobs were used to pull updates to environments. Overall this approach improved the speed, consistency, and security of global Puppet deployments.
My talk from the Bay area puppetcamp about deploying puppet code to a global network of puppet masters as quickly as possible.
Covers the design and implementation of the TIM Group (and now Yelp) puppetupdate mcollective agent: https://github.com/Yelp/puppetupdate/
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos
This document discusses parallel and asynchronous programming. It begins by explaining how processors are getting smaller while networks are getting worse, requiring more efficient parallel programming approaches. It then covers different parallel programming models in .NET like data parallelism using PLINQ, task parallelism using TPL, asynchronous programming with async/await, and concurrent collections. It also discusses challenges like cancellation, progress reporting, and synchronization, and how modern .NET addresses these.
Building trust within the organization, first steps towards DevOpsGuido Serra
This document discusses building trust within an organization through a DevOps approach. It introduces the role of a DevOps person to deliver features, mediate between devs and ops, and address non-functional requirements. It outlines steps taken such as listening to stakeholders, gathering requirements, and prioritizing non-functional needs. Tools are proposed for logging, metrics, and testing to provide transparency and shared understanding across teams. Results seen include improved support, proactive issue fixing, and better product performance through data and testing collaboration.
This document outlines an agenda for a coding kata workshop focusing on test-driven development and pair programming. The workshop includes introductions to code katas, test-driven development techniques like the three rules of TDD and the red-green-refactor process. It also covers structuring unit tests, principles of simple design, and roles in pair programming. The evening will involve practicing these concepts through a pizza-building kata exercise across three sessions.
This document discusses various strategies for backing up MongoDB data to keep it safe. It recommends:
1. Using mongodump for simple backups that can restore quickly but may be inconsistent.
2. Setting up replication for high availability, but also using mongodump for backups and testing restore processes.
3. Taking snapshots of the data files for consistent backups, but this requires downtime and gaps can occur between snapshots.
4. Using the oplog for incremental, continuous backups to avoid gaps without downtime using tools like the Wordnik Admin Tools. Testing backups is strongly recommended.
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)Panagiotis Kanavos
This document discusses parallel and asynchronous programming using the Task Parallel Library (TPL) in .NET. It covers how processors are getting smaller so parallelism is important. It provides examples of using TPL for data parallelism by partitioning work over collections and task parallelism by breaking work into steps. It also discusses asynchronous programming with async/await and how TPL handles cancellation, progress reporting, and synchronization contexts.
Outsmarting Merge Edge Cases in Component Based DesignPerforce
This document discusses edge cases and challenges that can occur when merging code changes between component-based software development streams. It outlines several types of complex merge scenarios, such as renames that cross stream views and "shadowed deletes" not caught by integration tools. The key lessons are to consider the big picture problem rather than symptoms, have a simple managed workflow, and continuously test upgrades. An ideal solution would involve source control at the file object level rather than filenames to more easily handle renames and component changes.
Process Scheduling Algorithms | Interviews | Operating systemShivam Mitra
IO Bound Process vs CPU Bound process
Types of scheduling queues and schedulers
Preemptive vs Nonpreemptive scheduling
Role of Dispatcher
Context Switch
Scheduling criteria
Scheduling algorithms ( FCFS, SJF, SRTF, Priority, Round Robin)
Multilevel Queue Scheduling
This document provides an overview of using the OllyDbg debugger to analyze malware. It discusses OllyDbg's history and interface, how to load and debug malware using OllyDbg, setting breakpoints, tracing code execution, patching code, and analyzing shellcode. The key points are that OllyDbg is an effective tool for debugging malware, it allows setting different breakpoint types, tracing helps record execution, and shellcode can be directly analyzed by pasting it into OllyDbg memory.
This document discusses parallel programming in .NET and provides an overview of the Task Parallel Library (TPL) and Parallel LINQ (PLINQ). It notes that multicore processors have existed for years but many developers are still writing single-threaded programs. The TPL scales concurrency dynamically across cores and handles partitioning work. PLINQ can improve performance of some queries by parallelizing across segments. Tasks represent asynchronous operations more efficiently than threads. The document provides examples of implicit and explicit task creation and running tasks in parallel using Parallel.Invoke or Task.Run.
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...Dakiry
This document discusses using serverless architecture for test automation. It begins with an overview of serverless computing and its benefits like instant scaling, low deployment complexity, and pay-per-use model. It then explores how serverless is well-suited for test automation, noting that tests can run in isolation, achieve high parallelism for fast execution, and benefit from the stability of clean runtime environments. The document also acknowledges challenges like monitoring at scale, cold start delays, and vendor lock-in. It provides examples of implementing API, UI, unit, and load tests on serverless and recommends starting with a simple setup for API tests before progressing to more complex scenarios.
This document discusses how a solo developer can implement DevOps practices to improve their workflow and reduce technical debt. Some key practices covered include automating deployments through scripts to replicate environments, monitoring uptime and errors, and using configuration frameworks like Ansible, Chef, or Puppet to further automate infrastructure management. The document emphasizes that starting small with shell scripts can help solo developers replace manual tasks and free up time to focus on their clients' business needs rather than technical issues.
The document discusses Keras, a high-level neural network API written in Python that can integrate with TensorFlow, Theano, and CNTK. Keras allows for fast prototyping of neural networks with convolutional and recurrent layers and supports common activation functions and loss functions. It can be used to easily turn models into products that run on devices, browsers, and platforms like iOS, Android, Google Cloud, and Raspberry Pi. Keras uses a simple pipeline of defining a network, compiling it, fitting it to data, evaluating it, and making predictions.
Ginsbourg.com presentation of open source performance validationPerfecto Mobile
Apache JMeter is an open source load testing tool that was designed to test web applications but has since expanded to other domains. It is a pure Java application that allows testing of websites, databases, LDAP, JMS, mail protocols and more. It provides graphs and reports on response times, throughput, and other metrics. Additional plugins exist like the Google JMeter plugin that adds more functionality for load testing and monitoring server performance.
The document discusses various anti-disassembly techniques used by malware authors to obscure disassembly and prevent automated analysis. These include using jump instructions to trick linear disassemblers into the wrong offset, abusing return pointers and structured exception handlers, and misleading analysis of stack frames. Flow-oriented disassembly is more robust but can still be confused by techniques like impossible disassembly combinations and obscuring true flow control. Manual cleanup in a tool like IDA Pro is often needed to recover the correct disassembly.
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
Using OS performance tools and basic alternatives to troubleshoot production database issues
The document discusses using Linux performance tools like pidstat, ps, and tracing tools like perf, systemtap, and dtrace to troubleshoot complex database problems that may involve issues at the operating system, hardware, or network level. It provides examples of using these tools to diagnose specific issues like memory fragmentation, I/O problems, and network congestion and presents a methodology around reproducing issues, analyzing tool output, identifying root causes, and developing solutions.
This document discusses using Celery to handle asynchronous tasks in Django. It covers what Celery is, why it's useful to not block the user, speed and scale considerations, common use cases like logging and email, choosing a message queue like RabbitMQ, setting up queues vs hosts, tools and commands, and setting up a Celery daemon on Ubuntu. It also provides links to the Yipit Django team's blog which discusses more about using Celery.
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Tim Bunce
Slides of my talk on Devel::NYTProf and optimizing perl code at the Italian Perl Workshop (IPW09). It covers the new features in NYTProf v3 and a new section outlining a multi-phase approach to optimizing your perl code.
30 mins long plus 10 mins of questions. Best viewed fullscreen.
Profiling and Optimizing for Xeon Phi with Allinea MAPIntel IT Center
Allinea's tools like MAP and DDT provide a unified development environment for profiling, debugging, and optimizing code running on Xeon Phi processors. MAP can show developers which loops are and aren't vectorized, and collects full profiling metrics directly on Xeon Phi cards. This helps identify optimization opportunities and ensures code is taking best advantage of the new architecture. The unified interface with DDT keeps developers productive whether profiling or debugging issues like OpenMP errors.
This document provides guidance for giving a great tech talk. It is divided into three parts: 80% preparation, 20% execution, and the audience outside the lecture hall. For preparation, the document emphasizes choosing an engaging topic, knowing your audience and timeslot, using few slides with clear visuals and code examples, and rehearsing. For execution, it discusses effective speaking techniques like eye contact and body language. It also outlines seven habits of ineffective presenters to avoid, such as being chained to your chair or going over time. The document concludes by addressing questions, sharing slides, and curating talks for maximum impact.
Slides for a college course at City College San Francisco. Based on "Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software", by Michael Sikorski and Andrew Honig; ISBN-10: 1593272901.
Instructor: Sam Bowne
Class website: https://samsclass.info/126/126_S17.shtml
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
Slides of the presentation about Kubernetes practices and learnings at NU.nl.
This presentation was the first of two at the Dutch Kubernetes meetup at the Sanoma Netherlands offices, that took place on Sept. 5th 2019
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
The document discusses preparing codes to take advantage of the Intel Knights Landing (KNL) processor. It outlines three main steps:
1. Use Allinea Performance Reports to analyze memory access and identify optimization opportunities like improving cache usage or increasing thread count.
2. If memory access is identified as a bottleneck, improve cache usage through techniques like blocking or increase thread count to hide latency. The KNL processor includes high bandwidth memory (HBM) that codes should leverage.
3. Identify loops dominating execution time using Allinea MAP and apply vectorization either through compiler flags or manual techniques. Vectorization is key to exploiting the KNL's processing capabilities.
Multicore processors are becoming prevalent due to the limitations of increasing single core clock speeds. This presents challenges for software to effectively utilize multiple cores. Functional programming is one option that avoids shared state and parallel access issues, but requires a significant mindset shift. Refactoring existing code using tools is another option to incrementally introduce parallelism. Hybrid approaches combining paradigms may also help transition. Key application areas currently benefiting include servers, scientific computing, and packet processing. However, significant existing code is not easily parallelized and performance gains have yet to be fully realized.
This document discusses various strategies for backing up MongoDB data to keep it safe. It recommends:
1. Using mongodump for simple backups that can restore quickly but may be inconsistent.
2. Setting up replication for high availability, but also using mongodump for backups and testing restore processes.
3. Taking snapshots of the data files for consistent backups, but this requires downtime and gaps can occur between snapshots.
4. Using the oplog for incremental, continuous backups to avoid gaps without downtime using tools like the Wordnik Admin Tools. Testing backups is strongly recommended.
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)Panagiotis Kanavos
This document discusses parallel and asynchronous programming using the Task Parallel Library (TPL) in .NET. It covers how processors are getting smaller so parallelism is important. It provides examples of using TPL for data parallelism by partitioning work over collections and task parallelism by breaking work into steps. It also discusses asynchronous programming with async/await and how TPL handles cancellation, progress reporting, and synchronization contexts.
Outsmarting Merge Edge Cases in Component Based DesignPerforce
This document discusses edge cases and challenges that can occur when merging code changes between component-based software development streams. It outlines several types of complex merge scenarios, such as renames that cross stream views and "shadowed deletes" not caught by integration tools. The key lessons are to consider the big picture problem rather than symptoms, have a simple managed workflow, and continuously test upgrades. An ideal solution would involve source control at the file object level rather than filenames to more easily handle renames and component changes.
Process Scheduling Algorithms | Interviews | Operating systemShivam Mitra
IO Bound Process vs CPU Bound process
Types of scheduling queues and schedulers
Preemptive vs Nonpreemptive scheduling
Role of Dispatcher
Context Switch
Scheduling criteria
Scheduling algorithms ( FCFS, SJF, SRTF, Priority, Round Robin)
Multilevel Queue Scheduling
This document provides an overview of using the OllyDbg debugger to analyze malware. It discusses OllyDbg's history and interface, how to load and debug malware using OllyDbg, setting breakpoints, tracing code execution, patching code, and analyzing shellcode. The key points are that OllyDbg is an effective tool for debugging malware, it allows setting different breakpoint types, tracing helps record execution, and shellcode can be directly analyzed by pasting it into OllyDbg memory.
This document discusses parallel programming in .NET and provides an overview of the Task Parallel Library (TPL) and Parallel LINQ (PLINQ). It notes that multicore processors have existed for years but many developers are still writing single-threaded programs. The TPL scales concurrency dynamically across cores and handles partitioning work. PLINQ can improve performance of some queries by parallelizing across segments. Tasks represent asynchronous operations more efficiently than threads. The document provides examples of implicit and explicit task creation and running tasks in parallel using Parallel.Invoke or Task.Run.
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...Dakiry
This document discusses using serverless architecture for test automation. It begins with an overview of serverless computing and its benefits like instant scaling, low deployment complexity, and pay-per-use model. It then explores how serverless is well-suited for test automation, noting that tests can run in isolation, achieve high parallelism for fast execution, and benefit from the stability of clean runtime environments. The document also acknowledges challenges like monitoring at scale, cold start delays, and vendor lock-in. It provides examples of implementing API, UI, unit, and load tests on serverless and recommends starting with a simple setup for API tests before progressing to more complex scenarios.
This document discusses how a solo developer can implement DevOps practices to improve their workflow and reduce technical debt. Some key practices covered include automating deployments through scripts to replicate environments, monitoring uptime and errors, and using configuration frameworks like Ansible, Chef, or Puppet to further automate infrastructure management. The document emphasizes that starting small with shell scripts can help solo developers replace manual tasks and free up time to focus on their clients' business needs rather than technical issues.
The document discusses Keras, a high-level neural network API written in Python that can integrate with TensorFlow, Theano, and CNTK. Keras allows for fast prototyping of neural networks with convolutional and recurrent layers and supports common activation functions and loss functions. It can be used to easily turn models into products that run on devices, browsers, and platforms like iOS, Android, Google Cloud, and Raspberry Pi. Keras uses a simple pipeline of defining a network, compiling it, fitting it to data, evaluating it, and making predictions.
Ginsbourg.com presentation of open source performance validationPerfecto Mobile
Apache JMeter is an open source load testing tool that was designed to test web applications but has since expanded to other domains. It is a pure Java application that allows testing of websites, databases, LDAP, JMS, mail protocols and more. It provides graphs and reports on response times, throughput, and other metrics. Additional plugins exist like the Google JMeter plugin that adds more functionality for load testing and monitoring server performance.
The document discusses various anti-disassembly techniques used by malware authors to obscure disassembly and prevent automated analysis. These include using jump instructions to trick linear disassemblers into the wrong offset, abusing return pointers and structured exception handlers, and misleading analysis of stack frames. Flow-oriented disassembly is more robust but can still be confused by techniques like impossible disassembly combinations and obscuring true flow control. Manual cleanup in a tool like IDA Pro is often needed to recover the correct disassembly.
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
Using OS performance tools and basic alternatives to troubleshoot production database issues
The document discusses using Linux performance tools like pidstat, ps, and tracing tools like perf, systemtap, and dtrace to troubleshoot complex database problems that may involve issues at the operating system, hardware, or network level. It provides examples of using these tools to diagnose specific issues like memory fragmentation, I/O problems, and network congestion and presents a methodology around reproducing issues, analyzing tool output, identifying root causes, and developing solutions.
This document discusses using Celery to handle asynchronous tasks in Django. It covers what Celery is, why it's useful to not block the user, speed and scale considerations, common use cases like logging and email, choosing a message queue like RabbitMQ, setting up queues vs hosts, tools and commands, and setting up a Celery daemon on Ubuntu. It also provides links to the Yipit Django team's blog which discusses more about using Celery.
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Tim Bunce
Slides of my talk on Devel::NYTProf and optimizing perl code at the Italian Perl Workshop (IPW09). It covers the new features in NYTProf v3 and a new section outlining a multi-phase approach to optimizing your perl code.
30 mins long plus 10 mins of questions. Best viewed fullscreen.
Profiling and Optimizing for Xeon Phi with Allinea MAPIntel IT Center
Allinea's tools like MAP and DDT provide a unified development environment for profiling, debugging, and optimizing code running on Xeon Phi processors. MAP can show developers which loops are and aren't vectorized, and collects full profiling metrics directly on Xeon Phi cards. This helps identify optimization opportunities and ensures code is taking best advantage of the new architecture. The unified interface with DDT keeps developers productive whether profiling or debugging issues like OpenMP errors.
This document provides guidance for giving a great tech talk. It is divided into three parts: 80% preparation, 20% execution, and the audience outside the lecture hall. For preparation, the document emphasizes choosing an engaging topic, knowing your audience and timeslot, using few slides with clear visuals and code examples, and rehearsing. For execution, it discusses effective speaking techniques like eye contact and body language. It also outlines seven habits of ineffective presenters to avoid, such as being chained to your chair or going over time. The document concludes by addressing questions, sharing slides, and curating talks for maximum impact.
Slides for a college course at City College San Francisco. Based on "Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software", by Michael Sikorski and Andrew Honig; ISBN-10: 1593272901.
Instructor: Sam Bowne
Class website: https://samsclass.info/126/126_S17.shtml
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
Slides of the presentation about Kubernetes practices and learnings at NU.nl.
This presentation was the first of two at the Dutch Kubernetes meetup at the Sanoma Netherlands offices, that took place on Sept. 5th 2019
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
The document discusses preparing codes to take advantage of the Intel Knights Landing (KNL) processor. It outlines three main steps:
1. Use Allinea Performance Reports to analyze memory access and identify optimization opportunities like improving cache usage or increasing thread count.
2. If memory access is identified as a bottleneck, improve cache usage through techniques like blocking or increase thread count to hide latency. The KNL processor includes high bandwidth memory (HBM) that codes should leverage.
3. Identify loops dominating execution time using Allinea MAP and apply vectorization either through compiler flags or manual techniques. Vectorization is key to exploiting the KNL's processing capabilities.
Multicore processors are becoming prevalent due to the limitations of increasing single core clock speeds. This presents challenges for software to effectively utilize multiple cores. Functional programming is one option that avoids shared state and parallel access issues, but requires a significant mindset shift. Refactoring existing code using tools is another option to incrementally introduce parallelism. Hybrid approaches combining paradigms may also help transition. Key application areas currently benefiting include servers, scientific computing, and packet processing. However, significant existing code is not easily parallelized and performance gains have yet to be fully realized.
2.4 Optimizing your Visual COBOL ApplicationsMicro Focus
This document discusses various techniques for optimizing Visual COBOL applications, including locating bottlenecks, tuning file access and configuration, optimizing database access, and structuring applications for better performance. It provides information on profiling tools, guidelines for file handling configuration options like access permissions and indexing, recommendations for database drivers and OpenESQL directives, best practices for program structure and modularity, and tips for working with data types and arithmetic operations. The overall goal is helping developers create applications that perform efficiently through various optimization and tuning strategies.
Tommi Reiman discusses optimizing Clojure performance and abstractions. He shares lessons learned from optimizing middleware performance and JSON serialization. Data-driven approaches can enable high performance while maintaining abstraction. Reitit is a new routing library that aims to have the fastest performance through techniques like compiled routing data. Middleware can also benefit from data-driven approaches without runtime penalties. Overall performance should be considered but not obsessively, as many apps do not require extreme optimization.
This document discusses parallel computing. It provides examples of tasks that can be solved faster through parallel processing by dividing the work among multiple processors. The key benefits of parallel computing are speeding up tasks and solving problems too large for a single processor. It also discusses limits of parallel computing such as load balancing and Amdahl's law, which places theoretical limits on speedup from additional processors.
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good ServerSerdar Basegmez
This document summarizes the troubleshooting process used to identify and resolve a performance issue impacting a mission critical Domino database. Initial analysis found the database compact was not completing and the server was experiencing high swap space usage and memory pressure. Further investigation revealed several issues with the database design and scheduled tasks. A multi-step process was then used to optimize the operating system, Domino configuration, and address a problem with a custom application that was filling memory. Collaborative debugging between administrators and developers was able to replicate the issue and identify the specific code causing the performance problem.
The document discusses performance analysis of the BOUT++ code. It notes that improving HPC performance has economic and scientific benefits. The goals of performance analysis are to identify bottlenecks and suggest improvements to optimize code performance. Profilers are used to measure performance and identify issues such as poor scaling, load imbalance, communication overhead, and memory bandwidth sensitivity. Analysis of BOUT++ shows good scaling up to 8,192 cores but decreased performance at higher concurrencies potentially due to increased computational work in ghost cells as the grid points per processor decrease with increasing concurrency.
Java performance - not so scary after allHolly Cummins
No one likes slow applications, but sometimes it's hard to know where to start when trying to fix a performance problem. This talk will cover a range of tools and techniques which can be used to track down and fix performance issues.
Topics covered:
Why performance really really matters
What's the garbage collector doing? (And why you should care.)
But why is the garbage collector doing all that, anyway? How to find out what's in your heap.
Are you waiting around on locks?
Is your application running the code it should be?
Pulling it all together
This document discusses parallel matrix multiplication. It describes how to break the problem down into independent inner product operations that can be computed concurrently across multiple processors. Specifically, it presents a parallel algorithm that:
1) Divides the matrix multiplication work into inner product tasks that can be computed independently in parallel;
2) Assigns each inner product task to a separate processor using a round-robin approach; and
3) Waits for all processors to complete their tasks before outputting the final result matrix.
Performance optimization techniques for Java codeAttila Balazs
The presentation covers the the basics of performance optimizations for real-world Java code. It starts with a theoretical overview of the concepts followed by several live demos
showing how performance bottlenecks can be diagnosed and eliminated. The demos include some non-trivial multi-threaded examples
inspired by real-world applications.
Performance tuning the Spring Pet Clinic sample applicationJulien Dubois
#1 Putting the application in production - The presenter moved the Spring Pet Clinic sample application to a production environment by removing debugging logs and using a real database.
#2 Creating a JMeter test - The presenter used JMeter to simulate 500 users loading the application and found it resulted in many errors and crashes with only 250 requests per second.
#3 Profiling - Profiling with VisualVM and JProfiler identified a memory leak in the Dandelion library and the use of HTTP sessions as initial performance issues.
This document summarizes a presentation about optimizing server-side performance. It discusses measuring performance metrics like time to first byte, optimizing databases through techniques like adding indexes and reducing joins, using caching with Memcached and APC, choosing fast web servers like Nginx and Lighttpd, and using load testing tools like JMeter to test performance before deployment. The presentation was given by a senior engineer at Wayfair to discuss their experiences optimizing their platform.
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
MongoDB presentation from Silicon Valley Code Camp 2015.
Walkthrough developing, deploying and operating a MongoDB application, avoiding the most common pitfalls.
The document discusses how the entropy of Ruby codebases increases over time if changes are not limited, making future changes more difficult. It advocates for writing specs to establish confidence in code and observing trends in metrics like code coverage, complexity, and churn to catch signs of rising entropy early. Sticking to conventions but knowing when to deviate, and focusing on principles over mechanics can help limit a codebase's entropy.
CS101- Introduction to Computing- Lecture 45Bilal Ahmed
This lecture provides a review and wrap-up of the CS101 Introduction to Computing course. It summarizes key topics covered over the past 44 lectures such as programming methodology, readable code, algorithm design, testing and debugging. It reviews objectives of the course which were to build an appreciation of fundamental computing concepts, familiarize students with popular productivity software, and achieve beginner proficiency in web development. The lecture concludes by asking for student feedback on how well the course objectives were achieved.
Video: https://www.youtube.com/watch?v=FJW8nGV4jxY and https://www.youtube.com/watch?v=zrr2nUln9Kk . Tutorial slides for O'Reilly Velocity SC 2015, by Brendan Gregg.
There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This tutorial explains methodologies for using these tools, and provides a tour of four tool types: observability, benchmarking, tuning, and static tuning. Many tools will be discussed, including top, iostat, tcpdump, sar, perf_events, ftrace, SystemTap, sysdig, and others, as well observability frameworks in the Linux kernel: PMCs, tracepoints, kprobes, and uprobes.
This tutorial is updated and extended on an earlier talk that summarizes the Linux performance tool landscape. The value of this tutorial is not just learning that these tools exist and what they do, but hearing when and how they are used by a performance engineer to solve real world problems — important context that is typically not included in the standard documentation.
Gearman is a software framework that allows distributing work across multiple machines. It consists of a daemon, clients, and workers. The daemon handles communication between clients and workers. Clients submit work to the daemon, which passes it to workers to complete. Workers register functions they can perform and handle tasks asynchronously. Gearman provides load balancing and allows processing work in parallel across languages. It can improve performance for tasks like image processing, email sending, and log analysis.
This document discusses utilizing multicore processors with OpenMP. It provides an overview of OpenMP, including that it is an industry standard for parallel programming in C/C++ that supports parallelizing loops and tasks. Examples are given of using OpenMP to parallelize particle system position calculation and collision detection across multiple threads. Performance tests on dual-core and triple-core systems show speedups of 2-5x from using OpenMP. Some limitations of OpenMP are also outlined.
This document discusses hardware provisioning best practices for MongoDB. It covers key concepts like bottlenecks, working sets, and replication vs sharding. It also presents two case studies where these concepts were applied: 1) For a Spanish bank storing logs, the working set was 4TB so they provisioned servers with at least that much RAM. 2) For an online retailer storing products, testing found the working set was 270GB, so they recommended a replica set with 384GB RAM per server to avoid complexity of sharding. The key lessons are to understand requirements, test with a proof of concept, measure resource usage, and expect that applications may become bottlenecks over time.
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
Twitter's operations team manages software performance, availability, capacity planning, and configuration management for Twitter. They use metrics, logs, and analysis to find weak points and take corrective action. Some techniques include caching everything possible, moving operations to asynchronous daemons, and optimizing databases to reduce replication delay and locks. The team also created several open source projects like CacheMoney for caching and Kestrel for asynchronous messaging.
Similar to Optimizing thread performance for a genomics variant caller (20)
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
What is Augmented Reality Image Trackingpavan998932
Augmented Reality (AR) Image Tracking is a technology that enables AR applications to recognize and track images in the real world, overlaying digital content onto them. This enhances the user's interaction with their environment by providing additional information and interactive elements directly tied to physical images.
4. Tool 2: Allinea Forge - Debugging and Profiling
• Debug and profile from
one interface,
configuration
• Secure native remote and
local access
• Rapidly switch between
the tasks
• Edit, build, commit,
debug, profile, optimize..
5. Small data files
<5% slowdown
No instrumentation
No recompilation
Our profiler finds the performance bottlenecks
6. Our debugger helps bugs and performance
• Observe why
workload is
imbalanced
• Observe why
particular code paths
are followed
• .. And fix any bugs
that optimization
creates!
7. Above all…
• The tools are aimed at any performance problem that matters
– Focus on time: the ultimate judge of performance
• Do not prejudge the problem
– Don’t assume it’s MPI messages, threads or I/O before profiling!
• If there’s a problem..
– Allinea Performance Reports shows it, and advises you on solutions
– Allinea Forge’s profiler shows it, next to your code
8. 6 steps to improve performance
Get a realistic test
case
• Performance on real data
matters
• Keep the test case for
reference and re-use
Profile your code
• Add “-g” flag to your
compilation
• Run with a profiler
Look for the significant
• Which part/phase of the
code dominates time?
• Is there any unexpected
significant time use?
What is the nature of
the problem?
• Compute? I/O? MPI?
Thread synchronization?
• Display the metrics that
show the problem best
Apply brain to solve
• MPI – can you balance the
work better?
• Compute – is memory time
dominant – can you improve
layout?
Think of the future
• Try larger process or thread
counts to watch for
scalability problems
• Keep the profile (.map file)
for future comparison
9. Example: Improving Thread Usage in Genomics
• DISCOVAR
– Variant caller and small genome assembler
– Sub-mammalian sized genomes
– Newer DISCOVAR de novo for larger genomes
• C++ and OpenMP
• Developed by Broad Institute at MIT
10. A first look – on real hardware
• It’s not I/O intensive
• Good quantity of
OpenMP time
• No vectorization
11. OpenMP in detail
• Physical cores are
200% loaded:
hyperthreading is on
• 17% of parallel region
time is synchronization
• .. That’s quite high
12. Investigating the OpenMP synchronization
• Horizontal time axis:
colour coded
– Dark green – single core
– Light green – OpenMP work
– Light blue – pthread
synchronization
– Gray – idle
• Vertical axis
– #cores doing something
• Something’s very wrong
towards the end – with
all the gray
13. Zoom in on the region
• Stacks, code, regions,
time are all focused on
zoom area
• Key observation:
– OpenMP region with
“omp critical” is where
the time is being wasted
14. Fixing
• #pragma omp critical
– Execute exactly one
thread at a time to
ensure safety
• Is costing too much
– Passing “token” from
thread to thread to do
small pieces of work.
• Run whole section on
one thread instead
– Has same semantics
16. As a performance report
• Improvements in
– Runtime
– Synchronization
overhead
17. Let’s try something bigger – into Amazon cloud!
• C4.8xlarge
– 36 hyperthreaded cores
– 60GB RAM
– Xeon E5-2666 v3 Haswell
– 25MB Cache
– 2.6GHZ
vs
• Our physical server
– 24 hyperthreaded cores
– 24 GB RAM
– Xeon E5-2407 v2
– 10MB Cache
– 2.4GHz
$ ./runme.sh
discovar version: Discovar r52488
loadaverage: 0.05 0.98 1.36 1/790 16317
2015-07-27 07:57 PERF: REAL 835.857 USER 36.188
SYSTEM 5.441 PERC 4.71
835 seconds to run on EC2
… vs …
~448 seconds on our physical server
Why?
18. Profile with Allinea Forge to find where the problem is
• Focus on initial 300
seconds: something
must be wrong here
• Serious lack of good
“green” compute
19. In detail…
• 36 threads, waiting… but who is using madvise?!
20. Why is glibc so bad?
• madvise system call in
_int_free()
– At least two context
switches each call ..
– This glibc version has
issues…?
• What other options are
there?
21. Maybe Google TCMalloc?
• Optimized for multi-
threaded applications
• No-win
– Same run time
– Issue is use of sys_futex
not madvise
• .. Not optimized for this
multithreaded
application!
24. Can Intel libraries help?
• We try the Intel TBB
multithreaded allocator
• 14 minutes down to 10
minutes!
• .. But still this code has
scope for more…
25. Real optimization of OpenMP regions
• NB – still profiling for
first 300 seconds only
• Significant inactivity in
final 60 seconds
• OpenMP region
– #pragma omp parallel for
• Is it working?
– No – the threads are idle
• Let’s remove
26. After the first fix…
• Now able to run to
completion
– 358 seconds
• Still inactivity at end of
run
27. Zoomed to the inactivity…
• Another OpenMP region
• Quick edit: comment out
the OpenMP, again!
29. Finally… something to sort out
• Recursive, in-place
multithreaded sorter
• Is not scaling well in
thread counts
• Options?
– Re-engineer
– Replace
– Tune
30. Let’s tune
• Try limiting the thread pool to 8 workers
– Better than 36 clashing threads?
31. Result…
• Runtime 4.7 minutes
• 3x improvement on
original
• #1 position on the
Broad Benchmark list
for a sub-$2 / hour
system!
32. Lessons learned
• Real codes exhibit many different performance patterns
– Profiling real data sets at real scales is vital to target the effort
– Small test cases do not expose all the problems
– Small thread counts can be too small to find real problems
• Changing code can be simple
– Use threads wisely – it will not always be faster
– Changing libraries – someone else might have fixed your problem
• Re-engineering is sometimes necessary
– Take advantage of vector units
– Take advantage of threads
33. Increase the performance of your software
Analyze and tune
with Allinea
Performance Reports
Develop, profile and
debug applications
with Allinea Forge
With professional
support when you
need it most
Read more!