The document discusses analyzing Android kernels for security vulnerabilities. It describes binary analysis techniques like disassembling kernels and fuzzing common targets like mmap and ioctl calls. A case study shows discovering a vulnerability in an audio driver through fuzzing its mmap function. The document also recommends approaches for static source code analysis like preprocessing Android kernel source with LLVM tools and using the Clang static analyzer to find issues. Suggestions are provided for smartphone manufacturers and SOC vendors to improve security practices.
This document discusses systemd and configuration management. It provides background on systemd, describing how it has been adopted by major Linux distributions like Debian and Red Hat Enterprise Linux. It also discusses challenges for configuration management tools in dealing with systemd, as systemd changes the way Linux systems are initialized and managed. Automation and reproducibility are important principles for both systemd and configuration management.
This document discusses various overflow issues that can occur with the splice and vmsplice Linux kernel functions. It describes stack and buffer overflows that can happen due to race conditions when accessing pipe buffers. It also proposes a pool overflow technique using SLUB memory and controlled data read from a TTY device to spray the kernel memory and potentially overflow adjacent objects. Finally, it notes that further research is needed to determine a suitable target and exploit methodology, and hints that pipe buffer sizes may allow overflowing kernel memory allocations.
How to run system administrator recruitment process? By creating platform based on open source parts in just 2 nights! I gave this talk in Poland / Kraków OWASP chapter meeting on 17th October 2013 at our local Google for Entrepreneurs site. It's focused on security and also shows how to create recruitment process in CTF / challenge way.
This story covers mostly security details of this whole platform. There's great chance, that I will give another talk about this system but this time focusing on technical details. Stay tuned ;)
This document provides an overview of modern evasion techniques for bypassing network defenses. It discusses using PowerShell, macros, and C# to generate payloads that can evade detection from antivirus vendors like Palo Alto, Fortinet, Cisco, and Proofpoint. Specific evasion tactics covered include obfuscating payloads, customizing Meterpreter, using Empire instead of Metasploit, modifying templates, and delivering payloads via links instead of attachments. The document demonstrates how to generate custom C# payloads, use PowerShell to bypass defenses, and encrypt payloads with Ebowla. It recommends tools like MSF, Empire, Pupy, Unicorn, and Ebowla for evasion and
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disastersinfodox
This document discusses vulnerabilities in TR-064 and TR-069 protocols for managing broadband network devices. It describes how TR-064 had issues with no password protection and readable credentials, allowing full device access. It also discusses prior vulnerabilities like Misfortune Cookie that allowed bypassing authentication in TR-069. The document then demonstrates how exploiting a persistent cross-site scripting vulnerability in the FreeACS server software through TR-069 requests could allow adding an administrative user and completely compromising the server. This could potentially allow attacking and reconfiguring millions of networked devices.
Как мы взломали распределенные системы конфигурационного управленияPositive Hack Days
В лекции речь пойдет о том, как команда исследователей обнаружила и эксплуатировала уязвимости различных систем конфигурационного управления в ходе пентестов. Авторы представят различные инструменты распределенного управления конфигурациями, например Apache ZooKeeper, HashiCorp Consul и Serf, CoreOS Etcd; расскажут о способах создания отпечатков этих систем, а также о том, как использовать в своих целях типичные ошибки в конфигурации для увеличения площади атак.
The document discusses analyzing Android kernels for security vulnerabilities. It describes binary analysis techniques like disassembling kernels and fuzzing common targets like mmap and ioctl calls. A case study shows discovering a vulnerability in an audio driver through fuzzing its mmap function. The document also recommends approaches for static source code analysis like preprocessing Android kernel source with LLVM tools and using the Clang static analyzer to find issues. Suggestions are provided for smartphone manufacturers and SOC vendors to improve security practices.
This document discusses systemd and configuration management. It provides background on systemd, describing how it has been adopted by major Linux distributions like Debian and Red Hat Enterprise Linux. It also discusses challenges for configuration management tools in dealing with systemd, as systemd changes the way Linux systems are initialized and managed. Automation and reproducibility are important principles for both systemd and configuration management.
This document discusses various overflow issues that can occur with the splice and vmsplice Linux kernel functions. It describes stack and buffer overflows that can happen due to race conditions when accessing pipe buffers. It also proposes a pool overflow technique using SLUB memory and controlled data read from a TTY device to spray the kernel memory and potentially overflow adjacent objects. Finally, it notes that further research is needed to determine a suitable target and exploit methodology, and hints that pipe buffer sizes may allow overflowing kernel memory allocations.
How to run system administrator recruitment process? By creating platform based on open source parts in just 2 nights! I gave this talk in Poland / Kraków OWASP chapter meeting on 17th October 2013 at our local Google for Entrepreneurs site. It's focused on security and also shows how to create recruitment process in CTF / challenge way.
This story covers mostly security details of this whole platform. There's great chance, that I will give another talk about this system but this time focusing on technical details. Stay tuned ;)
This document provides an overview of modern evasion techniques for bypassing network defenses. It discusses using PowerShell, macros, and C# to generate payloads that can evade detection from antivirus vendors like Palo Alto, Fortinet, Cisco, and Proofpoint. Specific evasion tactics covered include obfuscating payloads, customizing Meterpreter, using Empire instead of Metasploit, modifying templates, and delivering payloads via links instead of attachments. The document demonstrates how to generate custom C# payloads, use PowerShell to bypass defenses, and encrypt payloads with Ebowla. It recommends tools like MSF, Empire, Pupy, Unicorn, and Ebowla for evasion and
BSides Edinburgh 2017 - TR-06FAIL and other CPE Configuration Disastersinfodox
This document discusses vulnerabilities in TR-064 and TR-069 protocols for managing broadband network devices. It describes how TR-064 had issues with no password protection and readable credentials, allowing full device access. It also discusses prior vulnerabilities like Misfortune Cookie that allowed bypassing authentication in TR-069. The document then demonstrates how exploiting a persistent cross-site scripting vulnerability in the FreeACS server software through TR-069 requests could allow adding an administrative user and completely compromising the server. This could potentially allow attacking and reconfiguring millions of networked devices.
Как мы взломали распределенные системы конфигурационного управленияPositive Hack Days
В лекции речь пойдет о том, как команда исследователей обнаружила и эксплуатировала уязвимости различных систем конфигурационного управления в ходе пентестов. Авторы представят различные инструменты распределенного управления конфигурациями, например Apache ZooKeeper, HashiCorp Consul и Serf, CoreOS Etcd; расскажут о способах создания отпечатков этих систем, а также о том, как использовать в своих целях типичные ошибки в конфигурации для увеличения площади атак.
LinuxCon Europe, 2014. Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY . There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems.
Kernel vulnerabilities was commonly used to obtain admin privileges, and main rule was to stay in kernel as small time as possible! But nowdays even when you get admin / root then current operating systems are sometimes too restrictive. And that made kernel exploitation nice vector for installing to kernel mode!
In this talk we will examine steps from CPL3 to CPL0, including some nice tricks, and we end up with developing kernel mode drivers.
The document discusses the uncertainties that come with cloud security due to unknown devices and applications running in cloud environments. It advocates for automating security monitoring and response to help reduce dwell times for attackers. Specific techniques recommended include using Linux auditing tools to monitor processes, logins and network activity across cloud instances and storing the data in a backend for analysis to detect anomalies. Monitoring APIs and authentications is also suggested to detect compromised credentials or suspicious activity. The document stresses the importance of automating security to keep pace with threats in cloud environments.
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Liang Chen
This document discusses techniques for remotely gaining root privileges on Apple devices by exploiting vulnerabilities in the graphics components. It provides an overview of Apple's graphics architecture and the allowed graphics interfaces for sandboxed processes. It then analyzes attack surfaces in the userland WindowServer and QuartzCore interfaces, describing vulnerabilities previously found that allowed escalating privileges or bypassing sandbox restrictions. Finally, it walks through the exploitation of a double free vulnerability (CVE-2016-1804) in the multi-touch handling that could be leveraged to achieve remote code execution with root privileges.
44CON London 2015 - Hunting Asynchronous Vulnerabilities44CON
This document discusses asynchronous vulnerabilities and callback-oriented hacking techniques. It describes how asynchronous issues are often invisible and outlines solutions using callbacks, such as through DNS requests. It provides examples of payload techniques for issues like SQL injection, command injection, and XSS that call out to an external domain to confirm exploitation. Finally, it notes hazards like friendly fire and ways adversaries may detect the callbacks.
In order to harden kernel exploitation as much as possible was introduced variety of features including KASLR, SMEP and sometimes also SMAP.
Even those are powerful techniques their effectiveness rely on their cooperation, environment and their implementation.
We will present new and some not so new exploitation techniques, show ideas behind breaking trough before mentioned security features and why it is possible, and we will take a look at pool spraying on x64 as well.
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...Positive Hack Days
This document discusses dynamic binary instrumentation (DBI) and provides two examples of DBI tools. DBI allows analyzing a program's behavior at runtime by injecting instrumentation code. Two open-source DBI tools are described: WinHeap Explorer detects heap-based bugs with low overhead, while DrLtrace transparently traces malware library calls. DBI provides a powerful method for software security analysis, malware analysis, and reverse engineering. Traditional data structures in DBI can introduce significant overhead, so lightweight approaches are discussed.
Practical Operation Automation with StackStormShu Sugimoto
Automation is getting more and more important these days, but it is not always easy to achieve, because it requires tremendous effort to convert existing procedures machine-friendly. That often means, you need to change almost everything!
StackStorm (aka st2, https://stackstorm.com/) is an open source IFTTT-ish middleware that ships with powerful workflow engine and unique features called "inquiries".
I'll focus on this workflow engine functionalities of st2 and show how these can ease the "automation" of day to day tasks. The example I'll show in this presentation is the actual workflow that we use at JPNAP, the real world IXP operation.
This document discusses vulnerability design patterns for kernel exploitation. It outlines several common vulnerability classes for the kernel including out of boundary errors, buffer overflows, and null pointer writes. It provides examples of how these vulnerabilities could be used to achieve kernel code execution or privilege escalation. It also notes how kernel exploitation techniques have evolved over time to bypass defenses like KASLR and discusses developing exploitation tools instead of just shellcode.
This document describes how an implant could be developed for a Dropcam camera device. It begins by providing background on Dropcam and its capabilities. It then details steps taken to gain root access to the device, including exploiting vulnerabilities in Busybox and OpenSSL. Methods are proposed for persisting access, communicating with a C&C server, determining the device's location, and infecting hosts that view video from the Dropcam. The document concludes by conceptualizing how audio/video capture and injection of hooks could be implemented on the device and connected systems.
This document contains a sample WARC file with records of different types including requests, responses, metadata, and revisits related to archiving the website http://www.archive.org. It includes headers with information like record IDs, dates, URIs, digests, and record types. The records document the initial capture of an image on the site, later metadata and conversion records, and a sample revisit record showing the resource was not modified.
This document summarizes the Linux audit system and proposes improvements. It begins with an overview of auditd and how audit messages are generated and processed in the kernel. Issues with auditd's performance, output format, and filtering are discussed. An alternative approach is proposed that uses libmnl for netlink handling, groups related audit messages into JSON objects, applies Lua-based filtering, and supports multiple output types like ZeroMQ and syslog. Benchmark results show this rewrite reduces CPU usage compared to auditd. The document advocates for continued abstraction and integration of additional data sources while avoiding feature creep.
The document provides an overview of kernel crash dump analysis including:
- The tools and data required such as the crash utility, kernel symbol files, vmcore files
- How to install and use these components
- Basic crash commands to analyze system, memory, storage, and network subsystems
- How to dynamically load crash extension modules to add custom commands
Memory corruption techniques can be used to escalate privileges from Ring 3 to Ring 0 and even System Management Mode (SMM). Ring 3 applications are initially sandboxed, but exploits can target vulnerabilities like pool overflows to corrupt memory and bypass mitigations. This allows arbitrary code execution with Ring 0 privileges. Further techniques like object type confusion and hijacking driver callbacks can be leveraged to execute code in the most privileged SMM.
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root44CON
The document discusses 5 ways to exploit JTAG (Joint Test Action Group) interfaces to gain unauthorized access or privileges on a system. The 5 techniques are: 1) Accessing non-volatile storage like flash memory via boundary scan, 2) Scraping memory for offline forensic analysis, 3) Patching boot arguments to change how the system boots, 4) Directly patching the kernel by modifying code or function pointers in memory, and 5) Patching a specific process by searching memory for its code and modifying it. While some techniques like memory scraping are slow, others like boot argument patching or kernel patching can be done quickly and provide privileged access. JTAG interfaces provide I/O, execution control, and memory access that enable
Android è un argomento di grande interesse nel mondo dell'informatica ma lavorare sulla piattaforma non è semplice.
Questo intervento avrà un taglio pratico e spiegherà come procurarsi gli strumenti per compilare un modulo kernel su android, come sviluppare un semplice modulo e come caricarlo sul dispositivo. Infine, si presenterà come creare un modulo più complesso usando delle API specifiche del kernel di Android.
I sorgenti del workshop sono reperibili qui:
https://github.com/arighi/mysuspend
This document provides an overview of upcoming technologies beyond the Java Virtual Machine (JVM). It begins with introductions and then discusses several topics:
- There are many open-source JVMs beyond Oracle's HotSpot such as JamVM, Maxine, and JikesRVM.
- Reasons for using the JVM include its large standard library and ease of portability compared to alternative virtual machines. However, startup time can be slow.
- Techniques for improving JVM startup time are discussed, such as saving JIT-compiled code and using the Drip tool to pre-initialize JVMs.
- Native interoperability is explored through the Java Native Interface (JNI
In order to prevent exploiting mistakes, introduced in developing process, are continuously implemented various security mitigations & hardening on application level and in operating system level as well.
Even when those mitigations highly increase difficulty of exploitation of common bugs in software / core, you should not rely solely on them. And it can help to know background and limits of those techniques, which protect your software directly or indirectly.
In this talk we will take a look at some of helpful mitigations & features introduces past years (x64 address space, SMAP & SMEP, CFG, ...) focusing from kernel point of view. Its benefits, and weak points same time.
Your data is much safer at home than it is letting some corporation "take care of it" for you, right? Security reviews for some of the top vendors' devices reveal many interesting findings. Like everything else, there are bugs. But knowing what kinds of bugs and how the vendors have responded will allow you to better understand the impact of plugging these devices into your network. Jeremy will show you just how low access control and least privilege are their list of priorities. He'll also explore the amount of test collateral and debug interfaces sloppily left shipping to consumers. From remote roots to stealing social network tokens to just plain weird stuff, he'll expand on how it's not just about what they do, but also what they don't do. And, he'll give you some useful guidelines on how to close the gaps yourself.
Kdump and the kernel crash dump analysisBuland Singh
Kdump is a kernel crash dumping mechanism that uses kexec to load a separate crash kernel to capture a kernel memory dump (vmcore file) when the primary kernel crashes. It can be configured to dump the vmcore file to local storage or over the network. Testing involves triggering a kernel panic using SysRq keys which causes the crash kernel to load and dump diagnostic information to the configured target path for analysis.
Sensu and Sensibility - Puppetconf 2014Tomas Doran
As the Yelp infrastructure and engineering team grew, so did the pain of managing Nagios. Problems like splitting alerting across multiple teams, providing high availability and managing nagios systems in multiple environments had become pressing. As we grew towards a service oriented architecture and pushed some services out into the cloud, we rapidly needed more automated monitoring configuration.
An evolutionary solution wasn’t going to solve all of our problems, we needed to revolutionize our monitoring. Sensu is built from the ground up to solve many of our issues and be easy to extend.
This talk covers our puppet ‘monitoring_check’ API (that sets up monitoring for our services within puppet), how and why we deploy Sensu and our custom handlers and escalations, along with how we provide automatic ‘self service’ monitoring for dynamic services and how we deal with the challenges posed by the more ephemeral nature of cloud architectures.
Building source code level profiler for C++.pdfssuser28de9e
1. The document describes building a source code level profiler for C++ applications. It outlines 4 milestones: logging execution time, reducing macros, tracking function hit counts, and call path profiling using a radix tree.
2. Key aspects discussed include using timers to log function durations, storing profiling data in a timed entry class, and maintaining a call tree using a radix tree with nodes representing functions and profiling data.
3. The goal is to develop a customizable profiler to identify performance bottlenecks by profiling execution times and call paths at the source code level.
LinuxCon Europe, 2014. Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY . There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems.
Kernel vulnerabilities was commonly used to obtain admin privileges, and main rule was to stay in kernel as small time as possible! But nowdays even when you get admin / root then current operating systems are sometimes too restrictive. And that made kernel exploitation nice vector for installing to kernel mode!
In this talk we will examine steps from CPL3 to CPL0, including some nice tricks, and we end up with developing kernel mode drivers.
The document discusses the uncertainties that come with cloud security due to unknown devices and applications running in cloud environments. It advocates for automating security monitoring and response to help reduce dwell times for attackers. Specific techniques recommended include using Linux auditing tools to monitor processes, logins and network activity across cloud instances and storing the data in a backend for analysis to detect anomalies. Monitoring APIs and authentications is also suggested to detect compromised credentials or suspicious activity. The document stresses the importance of automating security to keep pace with threats in cloud environments.
Us 16-subverting apple-graphics_practical_approaches_to_remotely_gaining_root...Liang Chen
This document discusses techniques for remotely gaining root privileges on Apple devices by exploiting vulnerabilities in the graphics components. It provides an overview of Apple's graphics architecture and the allowed graphics interfaces for sandboxed processes. It then analyzes attack surfaces in the userland WindowServer and QuartzCore interfaces, describing vulnerabilities previously found that allowed escalating privileges or bypassing sandbox restrictions. Finally, it walks through the exploitation of a double free vulnerability (CVE-2016-1804) in the multi-touch handling that could be leveraged to achieve remote code execution with root privileges.
44CON London 2015 - Hunting Asynchronous Vulnerabilities44CON
This document discusses asynchronous vulnerabilities and callback-oriented hacking techniques. It describes how asynchronous issues are often invisible and outlines solutions using callbacks, such as through DNS requests. It provides examples of payload techniques for issues like SQL injection, command injection, and XSS that call out to an external domain to confirm exploitation. Finally, it notes hazards like friendly fire and ways adversaries may detect the callbacks.
In order to harden kernel exploitation as much as possible was introduced variety of features including KASLR, SMEP and sometimes also SMAP.
Even those are powerful techniques their effectiveness rely on their cooperation, environment and their implementation.
We will present new and some not so new exploitation techniques, show ideas behind breaking trough before mentioned security features and why it is possible, and we will take a look at pool spraying on x64 as well.
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...Positive Hack Days
This document discusses dynamic binary instrumentation (DBI) and provides two examples of DBI tools. DBI allows analyzing a program's behavior at runtime by injecting instrumentation code. Two open-source DBI tools are described: WinHeap Explorer detects heap-based bugs with low overhead, while DrLtrace transparently traces malware library calls. DBI provides a powerful method for software security analysis, malware analysis, and reverse engineering. Traditional data structures in DBI can introduce significant overhead, so lightweight approaches are discussed.
Practical Operation Automation with StackStormShu Sugimoto
Automation is getting more and more important these days, but it is not always easy to achieve, because it requires tremendous effort to convert existing procedures machine-friendly. That often means, you need to change almost everything!
StackStorm (aka st2, https://stackstorm.com/) is an open source IFTTT-ish middleware that ships with powerful workflow engine and unique features called "inquiries".
I'll focus on this workflow engine functionalities of st2 and show how these can ease the "automation" of day to day tasks. The example I'll show in this presentation is the actual workflow that we use at JPNAP, the real world IXP operation.
This document discusses vulnerability design patterns for kernel exploitation. It outlines several common vulnerability classes for the kernel including out of boundary errors, buffer overflows, and null pointer writes. It provides examples of how these vulnerabilities could be used to achieve kernel code execution or privilege escalation. It also notes how kernel exploitation techniques have evolved over time to bypass defenses like KASLR and discusses developing exploitation tools instead of just shellcode.
This document describes how an implant could be developed for a Dropcam camera device. It begins by providing background on Dropcam and its capabilities. It then details steps taken to gain root access to the device, including exploiting vulnerabilities in Busybox and OpenSSL. Methods are proposed for persisting access, communicating with a C&C server, determining the device's location, and infecting hosts that view video from the Dropcam. The document concludes by conceptualizing how audio/video capture and injection of hooks could be implemented on the device and connected systems.
This document contains a sample WARC file with records of different types including requests, responses, metadata, and revisits related to archiving the website http://www.archive.org. It includes headers with information like record IDs, dates, URIs, digests, and record types. The records document the initial capture of an image on the site, later metadata and conversion records, and a sample revisit record showing the resource was not modified.
This document summarizes the Linux audit system and proposes improvements. It begins with an overview of auditd and how audit messages are generated and processed in the kernel. Issues with auditd's performance, output format, and filtering are discussed. An alternative approach is proposed that uses libmnl for netlink handling, groups related audit messages into JSON objects, applies Lua-based filtering, and supports multiple output types like ZeroMQ and syslog. Benchmark results show this rewrite reduces CPU usage compared to auditd. The document advocates for continued abstraction and integration of additional data sources while avoiding feature creep.
The document provides an overview of kernel crash dump analysis including:
- The tools and data required such as the crash utility, kernel symbol files, vmcore files
- How to install and use these components
- Basic crash commands to analyze system, memory, storage, and network subsystems
- How to dynamically load crash extension modules to add custom commands
Memory corruption techniques can be used to escalate privileges from Ring 3 to Ring 0 and even System Management Mode (SMM). Ring 3 applications are initially sandboxed, but exploits can target vulnerabilities like pool overflows to corrupt memory and bypass mitigations. This allows arbitrary code execution with Ring 0 privileges. Further techniques like object type confusion and hijacking driver callbacks can be leveraged to execute code in the most privileged SMM.
44CON London 2015 - Jtagsploitation: 5 wires, 5 ways to root44CON
The document discusses 5 ways to exploit JTAG (Joint Test Action Group) interfaces to gain unauthorized access or privileges on a system. The 5 techniques are: 1) Accessing non-volatile storage like flash memory via boundary scan, 2) Scraping memory for offline forensic analysis, 3) Patching boot arguments to change how the system boots, 4) Directly patching the kernel by modifying code or function pointers in memory, and 5) Patching a specific process by searching memory for its code and modifying it. While some techniques like memory scraping are slow, others like boot argument patching or kernel patching can be done quickly and provide privileged access. JTAG interfaces provide I/O, execution control, and memory access that enable
Android è un argomento di grande interesse nel mondo dell'informatica ma lavorare sulla piattaforma non è semplice.
Questo intervento avrà un taglio pratico e spiegherà come procurarsi gli strumenti per compilare un modulo kernel su android, come sviluppare un semplice modulo e come caricarlo sul dispositivo. Infine, si presenterà come creare un modulo più complesso usando delle API specifiche del kernel di Android.
I sorgenti del workshop sono reperibili qui:
https://github.com/arighi/mysuspend
This document provides an overview of upcoming technologies beyond the Java Virtual Machine (JVM). It begins with introductions and then discusses several topics:
- There are many open-source JVMs beyond Oracle's HotSpot such as JamVM, Maxine, and JikesRVM.
- Reasons for using the JVM include its large standard library and ease of portability compared to alternative virtual machines. However, startup time can be slow.
- Techniques for improving JVM startup time are discussed, such as saving JIT-compiled code and using the Drip tool to pre-initialize JVMs.
- Native interoperability is explored through the Java Native Interface (JNI
In order to prevent exploiting mistakes, introduced in developing process, are continuously implemented various security mitigations & hardening on application level and in operating system level as well.
Even when those mitigations highly increase difficulty of exploitation of common bugs in software / core, you should not rely solely on them. And it can help to know background and limits of those techniques, which protect your software directly or indirectly.
In this talk we will take a look at some of helpful mitigations & features introduces past years (x64 address space, SMAP & SMEP, CFG, ...) focusing from kernel point of view. Its benefits, and weak points same time.
Your data is much safer at home than it is letting some corporation "take care of it" for you, right? Security reviews for some of the top vendors' devices reveal many interesting findings. Like everything else, there are bugs. But knowing what kinds of bugs and how the vendors have responded will allow you to better understand the impact of plugging these devices into your network. Jeremy will show you just how low access control and least privilege are their list of priorities. He'll also explore the amount of test collateral and debug interfaces sloppily left shipping to consumers. From remote roots to stealing social network tokens to just plain weird stuff, he'll expand on how it's not just about what they do, but also what they don't do. And, he'll give you some useful guidelines on how to close the gaps yourself.
Kdump and the kernel crash dump analysisBuland Singh
Kdump is a kernel crash dumping mechanism that uses kexec to load a separate crash kernel to capture a kernel memory dump (vmcore file) when the primary kernel crashes. It can be configured to dump the vmcore file to local storage or over the network. Testing involves triggering a kernel panic using SysRq keys which causes the crash kernel to load and dump diagnostic information to the configured target path for analysis.
Sensu and Sensibility - Puppetconf 2014Tomas Doran
As the Yelp infrastructure and engineering team grew, so did the pain of managing Nagios. Problems like splitting alerting across multiple teams, providing high availability and managing nagios systems in multiple environments had become pressing. As we grew towards a service oriented architecture and pushed some services out into the cloud, we rapidly needed more automated monitoring configuration.
An evolutionary solution wasn’t going to solve all of our problems, we needed to revolutionize our monitoring. Sensu is built from the ground up to solve many of our issues and be easy to extend.
This talk covers our puppet ‘monitoring_check’ API (that sets up monitoring for our services within puppet), how and why we deploy Sensu and our custom handlers and escalations, along with how we provide automatic ‘self service’ monitoring for dynamic services and how we deal with the challenges posed by the more ephemeral nature of cloud architectures.
Building source code level profiler for C++.pdfssuser28de9e
1. The document describes building a source code level profiler for C++ applications. It outlines 4 milestones: logging execution time, reducing macros, tracking function hit counts, and call path profiling using a radix tree.
2. Key aspects discussed include using timers to log function durations, storing profiling data in a timed entry class, and maintaining a call tree using a radix tree with nodes representing functions and profiling data.
3. The goal is to develop a customizable profiler to identify performance bottlenecks by profiling execution times and call paths at the source code level.
Alexei Vladishev - Zabbix - Monitoring Solution for EveryoneZabbix
Zabbix is an open source monitoring solution that can monitor all levels of infrastructure across various platforms. It uses triggers to define problems and collects data through active and passive agents to analyze metrics and detect issues. When problems occur, Zabbix can automatically react through escalation procedures that include notifications, tickets, and restarts. It is highly scalable and offers features like anomaly detection, forecasting, and event correlation for complex environments.
From Black Box to Black Magic, Pycon Ireland 2014Gloria Lovera
Machine learning algorithms in automotive field.
If you are interested in, I suggest also this presentation:
http://www.slideshare.net/bix883/machine-learning-virtual-sensors-automotive-intelligent-tire
The document discusses machine learning techniques for processing sensor data from vehicles. It describes how machine learning can be used to create virtual sensors from raw data by analyzing features, selecting relevant data, preprocessing to remove noise, and building models. Examples are provided of using support vector machines and neural networks to classify yaw rate from sensor signals. The document also introduces a tool called Distortion that manages machine learning jobs by uploading data, running algorithms, and analyzing results.
1) The document discusses the challenges of building a non-serverless lottery application and how the team transitioned to a serverless architecture using AWS Lambda and other serverless technologies.
2) It describes the process of mapping out the value stream for the original non-serverless application versus the serverless approach.
3) The document then outlines how the team implemented the serverless lottery application including using Lambda, API Gateway, MongoDB Atlas, SQS, and other services and how they addressed challenges like cold starts and resiliency.
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
Once you start working with Big Data systems, you discover a whole bunch of problems you won’t find in monolithic systems. Monitoring all of the components becomes a big data problem itself. In the talk, we’ll mention all of the aspects that you should take into consideration when monitoring a distributed system using tools like Web Services, Spark, Cassandra, MongoDB, AWS. Not only the tools, what should you monitor about the actual data that flows in the system? We’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems.
All of a sudden to monitor all of the components becomes a big data problem itself.
In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like:
Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services.
Not only the tools, what should you monitor about the actual data that flows in the system?
And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
The Heatmap - Why is Security Visualization so Hard?Raffael Marty
The extent and impact of recent security breaches is showing that current approaches are just not working. But what can we do to protect our business? We have been advocating monitoring for a long time as a way to detect subtle, advanced attacks. However, products have failed to deliver on this promise. Current solutions don't scale in both data volume and analytical insights. In this presentation we will explore why it is so hard to come up with a security monitoring (or shall we call it security intelligence) approach that helps find sophisticated attackers in all the data collected. We are going to explore the question of how to visualize a billion events. We are going to look at a number of security visualization examples to illustrate the problem and some possible solutions. These examples will also help illustrate how data mining and user experience design help us get a handle of the security visualization challenges - enabling us to gain deep insight for a number of security use-cases.
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseVictoriaMetrics
Monitoring is the key to successful operation of any software service, but commercial solutions are complex, expensive, and slow. Let us show you how to build monitoring that is simple, cost-effective, and fast using open source stacks easily accessible to any developer.
We’ll start with the elements of monitoring systems: data ingest, query engine, visualization, and alerting. We’ll then explain and contrast two implementation approaches. The first uses VictoriaMetrics, a fast growing, high performance time series database that uses PromQL for queries. The second is based on ClickHouse, a popular real-time analytics database that speaks SQL. Fast, affordable monitoring is within reach. This webinar provides designs and working code to get you there.
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHouse Webinar Slides
Monitoring is the key to the successful operation of any software service, but commercial solutions are complex, expensive, and slow. Let us show you how to build monitoring that is simple, cost-effective, and fast using open-source stacks easily accessible to any developer.
We’ll start with the elements of monitoring systems: data ingest, query engine, visualization, and alerting. We’ll then explain and contrast two implementation approaches. The first uses VictoriaMetrics, a fast-growing, high-performance time series database that uses PromQL for queries. The second is based on ClickHouse, a popular real-time analytics database that speaks SQL. Fast, affordable monitoring is within reach. This webinar provides designs and working code to get you there.
Presented by:
Roman Khavronenko, Co-Founder at VictoriaMetrics
Robert Hodges, CEO at Altinity
In this Meetup Yaar Reuveni – Team Leader & Nir Hedvat – Software Engineer from Liveperson Data Platform R&D team, will talk about the journey we made from early days of the data platform in production with high friction and low awareness to issues into a mature, measurable data platform that is visible and trustworthy.
Growing in the Wild. The story by CUBRID Database Developers.CUBRID
The presentation the CUBRID team presented at Russian Internet Technologies Conference in 2012. The presentation covers such questions as *WHY* CUBRID was developed, *WHY* the developers did not fork existing solutions, *WHY* it was necessary to develop a new RDBMS from scratch, and *HOW* CUBRID Database was evolved over the years.
This document describes the features and design of a Bandrich NV Manager software. It lists key features such as CRUD operations on tree-view, grid-view and hex dump views, utility functions, and preference settings. It also discusses important design goals like robustness and user experience. Top technical challenges are outlined, such as handling tree node click events and bidirectional data mapping between views. Questions about the mapping and highlighting implementations are addressed.
Rodrigo Albani de Campos gave a presentation on capacity planning at the São Paulo Perl Workshop. He discussed typical performance metrics like load average and CPU usage, but emphasized that time series data alone is not sufficient for capacity planning. He covered concepts like arrival rate, service time, and queues. De Campos demonstrated using the PDQ queuing model tool to model an Apache web server and explore "what if" scenarios. He provided several references for further reading on performance analysis and capacity planning techniques.
Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...Ontico
This document provides a summary of a presentation by CUBRID Database Developers. It discusses the reasons behind the development of CUBRID, including the disadvantages of existing database solutions like high licensing costs and lack of customization control. It outlines CUBRID's key features such as performance optimizations, scalability features like replication and sharding, and its goal of ease of use. The document summarizes CUBRID's development phases and improvements made to features like indexing, query processing and high availability.
This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
This session will cover:
1) The process that led to our decision to use Cassandra
2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment
3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs.
4) Our lessons learned and next steps
How do software engineers understand code changes?Yida Tao
This document summarizes a study on how software engineers understand code changes. The study found that understanding code changes is frequently practiced in major development tasks like reviewing changes, fixing bugs, and developing new features. Determining a change's risk and assessing its completeness were found to be important but difficult information needs. The study identified challenges like determining risk, understanding composite changes, and suggested practical improvements like better code navigation tools and change decomposition support.
Similar to Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices (20)
FreeBSD Core Team Update at BSDCan '19. Topics covered include: community survey results, working group updates (release engineering, documentation, and git).
Pg_prefaulter is a tool that helps eliminate replication lag and reduce startup times. It works by prefaulting WAL files on the follower nodes before the regular replication process applies the WAL. This is done by parsing the WAL files on the primary using pg_xlogdump to determine which database relations (tables, indexes) need to be prefaulted. Pg_prefaulter then issues prefetch system calls in parallel to warm the OS caches and disk buffers for those relations, improving performance of the downstream replication and recovery processes.
The document provides an overview of the FreeBSD/VPC virtual private cloud solution. Key points include:
- VPC uses the bhyve hypervisor for virtualization with good CPU and memory isolation between guests.
- Network isolation between guests is challenging with existing FreeBSD networking approaches like tap/bridge/vxlan due to performance issues.
- A new VPC subsystem is proposed to provide dedicated virtual network interfaces (vmnic, vpcp) for guests with improved performance.
- The VPC solution uses VXLAN encapsulation and unique VXLAN network identifiers (VNIs) to provide overlay network isolation between guests on different hosts in a multi-host deployment.
This document discusses codifying PostgreSQL database schemas using Terraform. It begins by explaining how to bootstrap a database by hand and then introduces Terraform as a way to automate and version the database schema. Key concepts covered include using Terraform providers and resources to define database schemas, importing existing databases into Terraform, and iterating on schema designs in a declarative way. The document aims to help users avoid issues with Terraform by following best practices.
ZFS provides several advantages over traditional block-based filesystems when used with PostgreSQL, including preventing bitrot, improved compression ratios, and write locality. ZFS uses copy-on-write and transactional semantics to ensure data integrity and allow for snapshots and clones. Proper configuration such as enabling compression and using ZFS features like intent logging can optimize performance when used with PostgreSQL's workloads.
Production Readiness Strategies in an Automated WorldSean Chittenden
This document discusses strategies for making a software service production ready. It begins by outlining the typical software life cycle from idea to production. It then discusses some of the organizational prerequisites needed for a production service, including standardized terminology, naming conventions, and rules for incident response. The document also provides examples of what to include in a production readiness checklist, such as an overview of the service, its consumers, release process, health metrics, and quality metrics.
The document discusses using Vagrant and cloud platforms like GCP to develop and deploy applications from development to production. It introduces Vagrant as a tool for setting up and managing development environments and shows how to use Vagrant with FreeBSD. It then demonstrates provisioning a FreeBSD VM on GCP and discusses identity and access management on the cloud platform. The document aims to provide an overview of using Vagrant for development and cloud platforms like GCP for production deployments.
In a dynamic infrastructure world, let's stop pretending credentials aren't public knowledge in an organization and just assume that they have already been leaked, now what?
PostgreSQL High-Availability and Geographic Locality using consulSean Chittenden
Virtual IPs or floating IPs have long been the workhorse mechanism for providing high-availability for database systems, however floating IP addresses have several limitations that make it problematic in modern data centers and cloud environments, notably that it requires all members be in the same Layer-2 domain. consul is a strongly consistent way of providing high-availability services in Layer-3 environments and provides fail-over across different geographic regions. In this talk we will discuss the benefits, setup, and use of consul for fail-over of PostgreSQL, both in a local data center scenario and a geographic redundancy scenario where databases are split across multiple data centers.
Modern tooling to assist with developing applications on FreeBSDSean Chittenden
Discuss a workflow and the tooling for FreeBSD engineers to develop locally on their laptop (OS-X, Windows, or FreeBSD), and push applications to bare metal or the cloud. The tooling required to provide good automation from a developer laptop to production takes time to evolve, however this lecture will jumpstart a series of best practices for FreeBSD engineers who want to see their business applications run on FreeBSD.
Description of some of the elements that go in to creating a PostgreSQL-as-a-Service for organizations with many teams and a diverse ecosystem of applications and teams.
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfFlorence Consulting
Quattordicesimo Meetup di Milano, tenutosi a Milano il 23 Maggio 2024 dalle ore 17:00 alle ore 18:30 in presenza e da remoto.
Abbiamo parlato di come Axpo Italia S.p.A. ha ridotto il technical debt migrando le proprie APIs da Mule 3.9 a Mule 4.4 passando anche da on-premises a CloudHub 1.0.
Understanding User Behavior with Google Analytics.pdfSEO Article Boost
Unlocking the full potential of Google Analytics is crucial for understanding and optimizing your website’s performance. This guide dives deep into the essential aspects of Google Analytics, from analyzing traffic sources to understanding user demographics and tracking user engagement.
Traffic Sources Analysis:
Discover where your website traffic originates. By examining the Acquisition section, you can identify whether visitors come from organic search, paid campaigns, direct visits, social media, or referral links. This knowledge helps in refining marketing strategies and optimizing resource allocation.
User Demographics Insights:
Gain a comprehensive view of your audience by exploring demographic data in the Audience section. Understand age, gender, and interests to tailor your marketing strategies effectively. Leverage this information to create personalized content and improve user engagement and conversion rates.
Tracking User Engagement:
Learn how to measure user interaction with your site through key metrics like bounce rate, average session duration, and pages per session. Enhance user experience by analyzing engagement metrics and implementing strategies to keep visitors engaged.
Conversion Rate Optimization:
Understand the importance of conversion rates and how to track them using Google Analytics. Set up Goals, analyze conversion funnels, segment your audience, and employ A/B testing to optimize your website for higher conversions. Utilize ecommerce tracking and multi-channel funnels for a detailed view of your sales performance and marketing channel contributions.
Custom Reports and Dashboards:
Create custom reports and dashboards to visualize and interpret data relevant to your business goals. Use advanced filters, segments, and visualization options to gain deeper insights. Incorporate custom dimensions and metrics for tailored data analysis. Integrate external data sources to enrich your analytics and make well-informed decisions.
This guide is designed to help you harness the power of Google Analytics for making data-driven decisions that enhance website performance and achieve your digital marketing objectives. Whether you are looking to improve SEO, refine your social media strategy, or boost conversion rates, understanding and utilizing Google Analytics is essential for your success.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
21. 21
Software Life Cycle: Contrived Lifecycle
Time
Readiness
1) Idea!
2) Production Ready 3) End of Life
2.9) "It’ll be time to wind this service down
when ___ happens and ___ comes online."
R&D
22. 22
Software Life Cycle: Dose of Reality
Time
Production
1) Idea!
2) Production Ready
4) End of Life
"Production Supported"
3) "Oops"
R&D
23. 23
Software Life Cycle: Do NOT Pass Go, No $200
Time
Production
1) Idea!
N) End of Life
"Production Supported"
Forced to fix code or docs.
R&D
24. 24
Software Life Cycle: Why the fails?
Time
Production
1) Idea!
2) Production Ready
N) End of Life
"Production Supported"
"Drug feet to produce docs."
[3,M) "Oops"
R&D
N-1) "That’s it, we’ve had enough…"
25. 25
Software Life Cycle
Time
Production
1) Idea!
2) Production Ready
N) End of Life
"Production Supported"
[3,M) "Oops"
R&D
N-2) "That’s it, we’ve had enough…"
N-1) "Just support it until
the next version is out"
26. 26
Software Life Cycle: Detecting Problems Early
Time
Production
1) Idea!
2) Production Ready
4) End of Life
"Production Supported"
3) "Oops"
R&D
WTB Alerting Here
43. HASHICORP
CPU Scheduler
Web Server -Thread 1
CPU - Core 1
CPU - Core 2
Web Server -Thread 2
Redis -Thread 1
Kernel -Thread 1
Work (Input) Resources
CPU
Scheduler
44. HASHICORP
CPU Scheduler
Web Server -Thread 1
CPU - Core 1
CPU - Core 2
Web Server -Thread 2
Redis -Thread 1
Kernel -Thread 1
Work (Input) Resources
CPU
Scheduler
64. 64
Metrics: Gauge
•Counter - Monotonic Number
•Bytes transmitted
•Number of 2XX requests
•Gauge - Non-monotonic number
•Load average
•Number of services in a critical state
66. 66
Metrics: Histogram
•Counter - Monotonic Number
•Bytes transmitted
•Number of 2XX requests
•Gauge - Non-monotonic number
•Load average
•Number of services in a critical state
•Histograms - Distribution of Streams of Values
•Latency of an individual request
•Disk IO latency
•Bytes per response
73. 73
Data Sizes to Problem Specificity
AMOUNT OF DATA NECESSARY TO
ANSWER THE QUESTION
IPSUM
SCOPE OR SPECIFICITY OF THE QUESTION IS THERE A
PROBLEM?
WHERE IS THE
PROBLEM?
WHAT IS THE
PROBLEM?
84. $ nomad status atlas-4119-b246fd8fa2
ID = atlas-4119-b246fd8fa2
Name = atlas-4119
Type = service
Priority = 50
Datacenters = dc1
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
console 0 0 1 0 0 0
frontend 0 0 2 0 0 0
worker 0 0 1 0 0 0
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
24e12544 9fedfef9 b7d7483e console run running 01/25/17 23:14:28 UTC
87f46c82 9fedfef9 d6b60eb1 worker run running 01/25/17 23:14:28 UTC
d5ea84f2 9fedfef9 70ba3d96 frontend run running 01/25/17 23:14:28 UTC
eff8882a 9fedfef9 bbb7b28f frontend run running 01/25/17 23:14:28 UTC
WTF?
85. $ nomad status atlas-4119-b246fd8fa2
ID = atlas-4119-b246fd8fa2
Name = atlas-4119
Type = service
Priority = 50
Datacenters = dc1
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
console 0 0 1 0 0 0
frontend 0 0 2 0 0 0
worker 0 0 1 0 0 0
Allocations
ID Eval ID Node ID Task Group Desired Status Created At
24e12544 9fedfef9 b7d7483e console run running 01/25/17 23:14:28 UTC
87f46c82 9fedfef9 d6b60eb1 worker run running 01/25/17 23:14:28 UTC
d5ea84f2 9fedfef9 70ba3d96 frontend run running 01/25/17 23:14:28 UTC
eff8882a 9fedfef9 bbb7b28f frontend run running 01/25/17 23:14:28 UTC
WTF?
86. $ nomad alloc-status 87f46c82
ID = 87f46c82
Eval ID = 9fedfef9
Name = atlas-4119.worker[0]
Node ID = d6b60eb1
Job ID = atlas-4119-b246fd8fa2
Client Status = running
Client Description = <none>
Desired Status = run
Desired Description = <none>
Created At = 01/25/17 23:14:28 UTC
Task "worker" is "running"
Task Resources
CPU Memory Disk IOPS Addresses
47/256 MHz 218 MiB/2.0 GiB 0 B 0
Recent Events:
Time Type Description
01/25/17 23:19:36 UTC Started Task started by client
01/25/17 23:14:28 UTC Downloading Artifacts Client is downloading artifacts
01/25/17 23:14:28 UTC Received Task received by client
87. $ nomad alloc-status d5ea84f2
ID = d5ea84f2
Eval ID = 9fedfef9
Name = atlas-4119.frontend[1]
Node ID = 70ba3d96
Job ID = atlas-4119-b246fd8fa2
Client Status = running
Client Description = <none>
Desired Status = run
Desired Description = <none>
Created At = 01/25/17 23:14:28 UTC
Task "frontend" is "running"
Task Resources
CPU Memory Disk IOPS Addresses
370/1024 MHz 673 MiB/2.0 GiB 0 B 0 atlasfrontend: 10.151.2.227:80
Recent Events:
Time Type Description
01/25/17 23:19:18 UTC Started Task started by client
01/25/17 23:14:28 UTC Downloading Artifacts Client is downloading artifacts
01/25/17 23:14:28 UTC Received Task received by client
NOT STATIC
92. % cat ../modules/nomad-job/interface.tf
# *-description's taken from https://www.nomadproject.io/docs/agent/telemetry.html
variable "cpu-kernel-description" {
type = "string"
default = "Total CPU resources consumed by the task in the system space"
}
variable "cpu-throttled-periods-description" {
type = "string"
default = "Number of periods when the container hit its throttling limit (`nr_throttled`)"
}
variable "cpu-throttled-time-description" {
type = "string"
default = "Total time that the task was throttled (`throttled_time`)"
}
variable "cpu-total-percentage-description" {
type = "string"
default = "Total CPU resources consumed by the task across all cores"
}
93. variable "cpu-total-ticks-description" {
type = "string"
default = "CPU ticks consumed by the process in the last collection interval"
}
variable "cpu-user-description" {
type = "string"
default = "An aggregation of all userland CPU usage for this Nomad job."
}
variable "environment" {
type = "string"
}
variable "human_name" {
description = "The human-friendly name for this job"
type = "string"
}
variable "job_name" {
type = "string"
description = "The Nomad Job Name (or its prefix)"
}
94. variable "job_tags" {
type = "list"
description = "Tags that should be added to this job's resources"
}
variable "memory-cache-description" {
type = "string"
default = "Amount of memory cached by the task"
}
variable "memory-kernel-usage-description" {
type = "string"
default = "Amount of memory used by the kernel for this task"
}
variable "memory-max-usage-description" {
type = "string"
default = "Maximum amount of memory ever used by the kernel for this task"
}
variable "memory-kernel-max-usage-description" {
type = "string"
default = "Maximum amount of memory ever used by the tasks in this job."
}
95. variable "memory-rss-description" {
type = "string"
default = "An aggregation of all resident memory for this Nomad job."
}
variable "memory-swap-description" {
type = "string"
default = "Amount of memory swapped by the task"
}
variable "nomad-tags" {
type = "list"
default = [ "source:nomad" ]
}
variable "task_group" {
type = "string"
description = "The name of the task group"
}
96. % cat ../modules/nomad-job/stream-groups.tf
resource "circonus_stream_group" "cpu-kern" {
name = "${var.human_name} CPU Kernel"
description = "${var.cpu-kernel-description}"
group {
query = "*`${var.job_name}-${var.task_group}`cpu`system"
type = "average"
}
tags = [ "${var.nomad-tags}", "${var.job_tags}", "resource:cpu", "use:utilization" ]
# unit = "%"
}
resource "circonus_stream_group" "memory-rss" {
name = "${var.human_name} Memory RSS"
description = "${var.memory-rss-description}"
group {
query = "*`${var.job_name}-${var.task_group}`memory`rss"
type = "average"
}
tags = [ "${var.nomad-tags}", "${var.job_tags}", "resource:memory", "use:utilization" ]
}
97. resource "circonus_trigger" "rss-alarm" {
check = "${circonus_check.usage.checks[0]}"
stream_name = "${var.used_metric_name}"
if {
value {
absent = "3600s"
}
then {
notify = [
"${circonus_contact_group.circonus-owners-slack.id}",
"${circonus_contact_group.circonus-owners-slack-escalation.id}",
]
severity = 1
}
}
if {
value {
# SEV1 if we're over 4GB
more = "${4 * 1024 * 1024 * 1024}"
}
...
106. 106
Parting Thoughts
•Be an engineer. Put rigid constraints around your app.
•Don't confuse static with rigid.
•Work top to bottom.
•Develop an error budget and prioritize.
•Be consistent in your observability regimen.
107. 107
Parting Thoughts
•Expose HTTP Endpoints for stats (both monotonic counters and gauges)
•Trap Metrics to a broker frequently to create a histogram (e.g. 100ms)
•Expose or export JSON Histograms
•Valuable metrics tend to record the behavior of edges, not vertices