Slides from SignalFx CTO Phillip Liu's presentation at the AWS Loft in SF after DockerCon: Behind the Scenes with SignalFx.
Phil discussed how SignalFx deploys, runs, and operates a completely Dockerized microservices architecture for a production SaaS application dealing with large volumes of high resolution customer data.
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx
From our Feb 25, 2016 webcast on operating Elasticsearch at scale, the metrics to monitor, and how to create low-noise meaningful alerts on Elasticsearch performance.
Maxime Petazzoni, Software Engineer at SignalFx, presents how we use Docker and how we monitor containers in production.
SignalFx has been using using Docker since November 2013. We have running Docker in prod ever since we’ve had a “prod” and back when Docker’s README said “DO NOT RUN IN PRODUCTION”.
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
You can read our blog post about it here: https://getindata.com/blog/how-to-build-continuously-processing-for-24-7-real-time-data-streaming-platform/
Hot to build continuously processing for 24/7 real-time data streaming platform?
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData
In this InfluxDays NYC 2019 talk by Gunnar Aasen (Manager of Partner Engineering at InfluxData), you will get an overview of the AWS Container Monitoring Stack as well as how you can use InfluxDB on AWS for container monitoring. This session will include a demo of the solution.
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case studyNETWAYS
What does it take to go from no flow support, to handling huge volumes of heterogeneous flow data in a 100% open-source monitoring stack, in a real-world environment? Expect a brief refresher on flows, an overview of the customer environment, and discussion of the engineering challenges faced. A medium dive follows into the movement of flow data from ingest to query and display, the solution architecture as it exists today, and lessons learned and their application to the project roadmap.
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...NETWAYS
Elastic virtualization using the popular OpenStack platform is for real. While Sysadmins and DevOps professionals fully embrace these new developments, managing them is still a challenge. Adding layers of abstraction for compute, network and storage resources further increases complexity. Resource sharing, the fully dynamic creation of networks, virtual machines and recently Linux containers inside the framework also increases the challenge of managing these already complex systems.
This presentation will provide insights on how to optimize the monitoring and management of OpenStack "from the bottom up", and from front to back to efficiently manage and troubleshoot OpenStack environments using API monitoring techniques and best of breed OpenSource tools such as Icinga 2.4, OpenStack API, Fuel, BoxSpy, OpenTSDB and others.
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
Disenchantment is a Netflix show following the medieval misadventures of a hard-drinking princess, her feisty elf, and her personal demon. In this talk, we will follow the story of Netflix’s container management platform, Titus, which powers critical aspects of the Netflix business (video encoding & streaming, big data, recommendations & machine learning, and other workloads). We’ll cover the challenges growing Titus from 10’s to 1000’s of workloads. We’ll talk about our feisty team’s work across container runtimes, scheduling & control plane, and cloud infrastructure integration. We’ll talk about the demons we’ve found on this journey covering operability, security, reliability and performance.
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx
From our Feb 25, 2016 webcast on operating Elasticsearch at scale, the metrics to monitor, and how to create low-noise meaningful alerts on Elasticsearch performance.
Maxime Petazzoni, Software Engineer at SignalFx, presents how we use Docker and how we monitor containers in production.
SignalFx has been using using Docker since November 2013. We have running Docker in prod ever since we’ve had a “prod” and back when Docker’s README said “DO NOT RUN IN PRODUCTION”.
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
You can read our blog post about it here: https://getindata.com/blog/how-to-build-continuously-processing-for-24-7-real-time-data-streaming-platform/
Hot to build continuously processing for 24/7 real-time data streaming platform?
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData
In this InfluxDays NYC 2019 talk by Gunnar Aasen (Manager of Partner Engineering at InfluxData), you will get an overview of the AWS Container Monitoring Stack as well as how you can use InfluxDB on AWS for container monitoring. This session will include a demo of the solution.
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case studyNETWAYS
What does it take to go from no flow support, to handling huge volumes of heterogeneous flow data in a 100% open-source monitoring stack, in a real-world environment? Expect a brief refresher on flows, an overview of the customer environment, and discussion of the engineering challenges faced. A medium dive follows into the movement of flow data from ingest to query and display, the solution architecture as it exists today, and lessons learned and their application to the project roadmap.
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...NETWAYS
Elastic virtualization using the popular OpenStack platform is for real. While Sysadmins and DevOps professionals fully embrace these new developments, managing them is still a challenge. Adding layers of abstraction for compute, network and storage resources further increases complexity. Resource sharing, the fully dynamic creation of networks, virtual machines and recently Linux containers inside the framework also increases the challenge of managing these already complex systems.
This presentation will provide insights on how to optimize the monitoring and management of OpenStack "from the bottom up", and from front to back to efficiently manage and troubleshoot OpenStack environments using API monitoring techniques and best of breed OpenSource tools such as Icinga 2.4, OpenStack API, Fuel, BoxSpy, OpenTSDB and others.
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
Disenchantment is a Netflix show following the medieval misadventures of a hard-drinking princess, her feisty elf, and her personal demon. In this talk, we will follow the story of Netflix’s container management platform, Titus, which powers critical aspects of the Netflix business (video encoding & streaming, big data, recommendations & machine learning, and other workloads). We’ll cover the challenges growing Titus from 10’s to 1000’s of workloads. We’ll talk about our feisty team’s work across container runtimes, scheduling & control plane, and cloud infrastructure integration. We’ll talk about the demons we’ve found on this journey covering operability, security, reliability and performance.
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Nagios
Marcelo Perazolo, Lead Software Architect, IBM Corporation - In this session, Marcelo will describe how Nagios can be
integrated and extended for the monitoring of a typical
power-based converged infrastructure, and how it interfaces with existing element managers to provide a single point of integration for passive and active monitoring purposes.
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
A presentation about the strong competition between open-source vendors and public cloud providers in the Big Data landscape.
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...Flink Forward
Mux uses Apache Flink to identify anomalies in the distribution & playback of digital video for major video streaming websites. Scott Kidder will describe the Apache Flink deployment at Mux leveraging Docker, AWS Kinesis, Zookeeper, HDFS, and InfluxDB. Deploying a Flink application in a zero-downtime production environment can be tricky, so unit- & behavioral-testing, application packaging, upgrade, and monitoring strategies will be covered as well.
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformNETWAYS
There are many tools and frameworks for monitoring. Usually when you think of an Open Source solution, you don’t think to implement it in a COTS product. Nevertheless, this session will tell you how you can implement tools such as Prometheus, Grafana and ELK into such an Enterprise application platform. Monitoring performance, throughput and error rate is important to be in control of your transactions. If you use a Service Bus or SOA/BPM suite product there are a lot out of the box diagnostics waiting for you. The puzzle here is how to get it out in a useful way. Besides of the many commercial solutions also Open Source tools can help you out with it. You can export runtime diagnostics out of the Diagnostics framework, monitor your SOA Composites and trace down Service Bus statistics using Prometheus and Grafana. The session will elaborate how to set up a proper monitoring using these tools, also in a proactive way where automated monitoring is a must for every application environment.
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
Apache Spark is a gift to the big data community, which adds tons of new features on every release. However, it’s difficult to manage petabyte-scale Hadoop clusters with hundreds of edge nodes, multiple Spark releases and demonstrate operational efficiencies and standardization. In order to address these challenges, Paypal has developed and deployed a REST0based Spark platform: Spark Compute as a Service (SCaaS),which provides improved application development, execution, logging, security, workload management and tuning.
This session will walk through the top challenges faced by PayPal administrators, developers and operations and describe how Paypal’s SCaaS platform overcomes them by leveraging open source tools and technologies, like Livy, Jupyter, SparkMagic, Zeppelin, SQL Tools, Kafka and Elastic. You’ll also hear about the improvements PayPal has added, which enable it to run greater than 10,000 Spark applications in production effectively.
OSMC 2021 | Monitoring Open Source HardwareNETWAYS
As part of a new initiative to enable open source hardware, multiple manufacturers including IBM, HPE, and others have open source hardware machines with open source hardware, firmware, and software. This provides more opportunities for monitoring and getting agent-less data but also agent-based data. This presentation will show some of the open source hardware and will show how you can enable you to get control and monitor this hardware using Icinga2.
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward
Within fintech catching fraudsters is one of the primary opportunities for us to use streaming applications to apply ML models in real-time. This talk will be a review of our journey to bring fraud decisioning to our tellers at Capital One using Kafka, Flink and AWS Lambda. We will share our learnings and experiences to common problems such as custom windowing, breaking down a monolith app to small queryable state apps, feature engineering with Jython, dealing with back pressure from combining two disparate streams, model/feature validation in a regulatory environment, and running Flink jobs on Kubernetes.
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Kafka, and Flink
Timothy Spann
Twitter - @PaasDev // Blog: www.datainmotion.dev
Frequent speaker at major conferences and events.
Principal DataFlow Field Engineer for streaming around Apache NiFi, NiFi Registry, MiNiFi, Kafka, Kafka Connect, Kafka Streams, Flink, Flink SQL, SMM, SRM, SR and EFM.
Previously at E&Y, HPE, Pivotal & Hortonworks
Question #1
What is the most difficult part of an Edge Flow?
Gateway Agent
Edge Data Collection
Processing Data
https://github.com/tspannhw/DemoJam2021
https://github.com/tspannhw/CloudDemo2021
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesNETWAYS
This session is a mix of discussion & live demo topics:
– Intro to OpenInfra/OpenStack (Why you need your own Cloud)
– What Service Logs to gather and how to format and filter them
– Optimizing data as time series indeces
– Visualizing large quantity of Logs – what’s important?
– Demo Scenario: Response Times – maintaining your SLAs
– Demo Scenario: Tracking Storage growth over time – predicting when to expand
– Demo Scenario: Identifying priority service problems
– Demo of building custom visualizations
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016Zabbix
With global shift towards flexibility of cloud there are different demands on monitoring availability and performance of applications provided in the cloud. There are obvious limitations in accessing components of app hosted by third party run outside of internal environment. Same time there are opportunities of using vendor API and status page. In Salesforce, one of the most innovative company in the world by Forbes and one of the biggest cloud service provider, we understand the need of customer to be able to see in real time availability and performance of cloud application. In the following presentation we're going to list and describe multiple ways of monitoring cloud apps. Some of the methods are: building in web monitoring using Curl, web browser automation tools like Selenium, external scripts (reading vendor status dashboard) and API calls to the app.
Virtual Flink Forward 2020: Data driven matchmaking streaming at Hyperconnect...Flink Forward
HyperConnect's 1to1 video matchmaking system is consist of various machine learning techniques to maximize user satisfaction. Our matchmaking system manages large user context containing actions a few seconds ago, and reacts in milliseconds to produce meaningful new results in each user session. It's difficult in traditional way. So, distributed streaming is essential to handle in this cases. Topics include: - Why our team choose Apache Flink in comparison with alternatives - Matchmaking streaming architecture with detail abstraction levels based on Flink operator - Pairwise scoring microservice management with Flink - Stateful matchmaking computation with low latency, fault-tolerance, and scalability - How to manage large-scale events: classifying feature types, collecting with a multi-window stream - Applications: personalization, multi-armed-bandit on stream.
stackconf 2020 | Ignite talk: Opensource in Advanced Research Computing, How ...NETWAYS
Opensource software is becoming a pillar in our everyday life, leveraged by our cell phones, our transportation systems and on the websites we visit. In this quick talk, we will have a look on how Canada’s Advanced Research Computing (“ARC”) organizations use opensource software to deploy and operate some of the largest Supercomputers and Cloud deployments on Earth. We will briefly introduce the systems and dig deeper into the opensource technologies that together make the magic happen !
Fully Automated Kubernetes Deployment and Management (Peng Jiang, Rancher Labs) - Kubernetes is rapidly gaining popularity as a powerful container orchestration and scheduling platform. But deploying and managing Kubernetes clusters is still a challenge for many organizations.How to ensure Kubernetes clusters in different clouds and data centers can communicate with each other? How to automate the deployment of multiple Kubernetes clusters? How to incorporate the new Kubernetes Federation into multi cloud and multi datacenter deployments? How to manage the health of Kubernetes cluster itself? etc.
In this talk, Peng will share his experience on how to automate and simplify Kubernetes deployments, and discuss how some of the latest community projects (such as kubeadm and self-hosting Kubernetes) will help address the problems in the future.
Mike Weber's presentation on Nagios rapid deployment options. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference.
Sysdig is infinitely extensible through Chisels, and now you’re going to learn how to build one. Using a real-world example, we’re going to show you how to leverage sysdig’s luascript engine to build powerful new functionality customized to your needs.
SignalFx engineer Rajiv Kurian's presentation on why we wrote our own Kafka consumer, the performance goals, and the performance gains achieved.
Download the slides to see animations showing hardware details. These slides were converged from Keynote to Powerpoint, so there may be some oddness with slide transitions!
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
SignalFx ingests, processes runs analytics against, (and ultimately stores) massive numbers of time series streaming in parallel into our service which provides an analytics-based monitoring platform for modern applications.
We've chose to build our time series database (TSDB) on Cassandra for it's read and write performance at high load. This presentation will go over our evolution of optimizations to squeeze the most performance out of the TSDB to date and some steps we'll be taking in the future.
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Nagios
Marcelo Perazolo, Lead Software Architect, IBM Corporation - In this session, Marcelo will describe how Nagios can be
integrated and extended for the monitoring of a typical
power-based converged infrastructure, and how it interfaces with existing element managers to provide a single point of integration for passive and active monitoring purposes.
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
A presentation about the strong competition between open-source vendors and public cloud providers in the Big Data landscape.
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...Flink Forward
Mux uses Apache Flink to identify anomalies in the distribution & playback of digital video for major video streaming websites. Scott Kidder will describe the Apache Flink deployment at Mux leveraging Docker, AWS Kinesis, Zookeeper, HDFS, and InfluxDB. Deploying a Flink application in a zero-downtime production environment can be tricky, so unit- & behavioral-testing, application packaging, upgrade, and monitoring strategies will be covered as well.
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformNETWAYS
There are many tools and frameworks for monitoring. Usually when you think of an Open Source solution, you don’t think to implement it in a COTS product. Nevertheless, this session will tell you how you can implement tools such as Prometheus, Grafana and ELK into such an Enterprise application platform. Monitoring performance, throughput and error rate is important to be in control of your transactions. If you use a Service Bus or SOA/BPM suite product there are a lot out of the box diagnostics waiting for you. The puzzle here is how to get it out in a useful way. Besides of the many commercial solutions also Open Source tools can help you out with it. You can export runtime diagnostics out of the Diagnostics framework, monitor your SOA Composites and trace down Service Bus statistics using Prometheus and Grafana. The session will elaborate how to set up a proper monitoring using these tools, also in a proactive way where automated monitoring is a must for every application environment.
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
Apache Spark is a gift to the big data community, which adds tons of new features on every release. However, it’s difficult to manage petabyte-scale Hadoop clusters with hundreds of edge nodes, multiple Spark releases and demonstrate operational efficiencies and standardization. In order to address these challenges, Paypal has developed and deployed a REST0based Spark platform: Spark Compute as a Service (SCaaS),which provides improved application development, execution, logging, security, workload management and tuning.
This session will walk through the top challenges faced by PayPal administrators, developers and operations and describe how Paypal’s SCaaS platform overcomes them by leveraging open source tools and technologies, like Livy, Jupyter, SparkMagic, Zeppelin, SQL Tools, Kafka and Elastic. You’ll also hear about the improvements PayPal has added, which enable it to run greater than 10,000 Spark applications in production effectively.
OSMC 2021 | Monitoring Open Source HardwareNETWAYS
As part of a new initiative to enable open source hardware, multiple manufacturers including IBM, HPE, and others have open source hardware machines with open source hardware, firmware, and software. This provides more opportunities for monitoring and getting agent-less data but also agent-based data. This presentation will show some of the open source hardware and will show how you can enable you to get control and monitor this hardware using Icinga2.
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward
Within fintech catching fraudsters is one of the primary opportunities for us to use streaming applications to apply ML models in real-time. This talk will be a review of our journey to bring fraud decisioning to our tellers at Capital One using Kafka, Flink and AWS Lambda. We will share our learnings and experiences to common problems such as custom windowing, breaking down a monolith app to small queryable state apps, feature engineering with Jython, dealing with back pressure from combining two disparate streams, model/feature validation in a regulatory environment, and running Flink jobs on Kubernetes.
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Kafka, and Flink
Timothy Spann
Twitter - @PaasDev // Blog: www.datainmotion.dev
Frequent speaker at major conferences and events.
Principal DataFlow Field Engineer for streaming around Apache NiFi, NiFi Registry, MiNiFi, Kafka, Kafka Connect, Kafka Streams, Flink, Flink SQL, SMM, SRM, SR and EFM.
Previously at E&Y, HPE, Pivotal & Hortonworks
Question #1
What is the most difficult part of an Edge Flow?
Gateway Agent
Edge Data Collection
Processing Data
https://github.com/tspannhw/DemoJam2021
https://github.com/tspannhw/CloudDemo2021
OSMC 2021 | Monitoring Open Infrastructure Logs – With Real Life ExamplesNETWAYS
This session is a mix of discussion & live demo topics:
– Intro to OpenInfra/OpenStack (Why you need your own Cloud)
– What Service Logs to gather and how to format and filter them
– Optimizing data as time series indeces
– Visualizing large quantity of Logs – what’s important?
– Demo Scenario: Response Times – maintaining your SLAs
– Demo Scenario: Tracking Storage growth over time – predicting when to expand
– Demo Scenario: Identifying priority service problems
– Demo of building custom visualizations
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016Zabbix
With global shift towards flexibility of cloud there are different demands on monitoring availability and performance of applications provided in the cloud. There are obvious limitations in accessing components of app hosted by third party run outside of internal environment. Same time there are opportunities of using vendor API and status page. In Salesforce, one of the most innovative company in the world by Forbes and one of the biggest cloud service provider, we understand the need of customer to be able to see in real time availability and performance of cloud application. In the following presentation we're going to list and describe multiple ways of monitoring cloud apps. Some of the methods are: building in web monitoring using Curl, web browser automation tools like Selenium, external scripts (reading vendor status dashboard) and API calls to the app.
Virtual Flink Forward 2020: Data driven matchmaking streaming at Hyperconnect...Flink Forward
HyperConnect's 1to1 video matchmaking system is consist of various machine learning techniques to maximize user satisfaction. Our matchmaking system manages large user context containing actions a few seconds ago, and reacts in milliseconds to produce meaningful new results in each user session. It's difficult in traditional way. So, distributed streaming is essential to handle in this cases. Topics include: - Why our team choose Apache Flink in comparison with alternatives - Matchmaking streaming architecture with detail abstraction levels based on Flink operator - Pairwise scoring microservice management with Flink - Stateful matchmaking computation with low latency, fault-tolerance, and scalability - How to manage large-scale events: classifying feature types, collecting with a multi-window stream - Applications: personalization, multi-armed-bandit on stream.
stackconf 2020 | Ignite talk: Opensource in Advanced Research Computing, How ...NETWAYS
Opensource software is becoming a pillar in our everyday life, leveraged by our cell phones, our transportation systems and on the websites we visit. In this quick talk, we will have a look on how Canada’s Advanced Research Computing (“ARC”) organizations use opensource software to deploy and operate some of the largest Supercomputers and Cloud deployments on Earth. We will briefly introduce the systems and dig deeper into the opensource technologies that together make the magic happen !
Fully Automated Kubernetes Deployment and Management (Peng Jiang, Rancher Labs) - Kubernetes is rapidly gaining popularity as a powerful container orchestration and scheduling platform. But deploying and managing Kubernetes clusters is still a challenge for many organizations.How to ensure Kubernetes clusters in different clouds and data centers can communicate with each other? How to automate the deployment of multiple Kubernetes clusters? How to incorporate the new Kubernetes Federation into multi cloud and multi datacenter deployments? How to manage the health of Kubernetes cluster itself? etc.
In this talk, Peng will share his experience on how to automate and simplify Kubernetes deployments, and discuss how some of the latest community projects (such as kubeadm and self-hosting Kubernetes) will help address the problems in the future.
Mike Weber's presentation on Nagios rapid deployment options. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference.
Sysdig is infinitely extensible through Chisels, and now you’re going to learn how to build one. Using a real-world example, we’re going to show you how to leverage sysdig’s luascript engine to build powerful new functionality customized to your needs.
SignalFx engineer Rajiv Kurian's presentation on why we wrote our own Kafka consumer, the performance goals, and the performance gains achieved.
Download the slides to see animations showing hardware details. These slides were converged from Keynote to Powerpoint, so there may be some oddness with slide transitions!
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
SignalFx ingests, processes runs analytics against, (and ultimately stores) massive numbers of time series streaming in parallel into our service which provides an analytics-based monitoring platform for modern applications.
We've chose to build our time series database (TSDB) on Cassandra for it's read and write performance at high load. This presentation will go over our evolution of optimizations to squeeze the most performance out of the TSDB to date and some steps we'll be taking in the future.
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15SignalFx
SignalFx engineer Paul Ingram presented these slides at Cassandra Summit 2015.
SignalFx ingests, processes runs analytics against, (and ultimately stores) massive numbers of time series streaming in parallel into our service which provides an analytics-based monitoring platform for modern applications.
We've chose to build our time series database (TSDB) on Cassandra for it's read and write performance at high load. This presentation will go over our evolution of optimizations to squeeze the most performance out of the TSDB to date and some steps we'll be taking in the future.
Read more: http://blog.signalfx.com/making-cassandra-perform-as-a-tsdb
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
By Rajiv Kurian, software engineer at SignalFx.
At SignalFx, we deal with high-volume high-resolution data from our users. This requires a high performance ingest pipeline. Over time we’ve found that we needed to adapt architectural principles from specialized fields such as HPC to get beyond performance plateaus encountered with more generic approaches. Some key examples include:
* Write very simple single threaded code, instead of complex algorithms
* Parallelize by running multiple copies of simple single threaded code, instead of using concurrent algorithms
* Separate the data plane from the control plane, instead of slowing data for control
* Write compact, array-based data structures with minimal indirection, instead of pointer-based data structures and uncontrolled allocation
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Slides of Maxime Petazzoni's talk at the Palo Alto Docker Meetup on September 1st, 2015. Discusses how we use Docker to power our software development lifecycle and run our production environments, as well as how to monitor Dockerized deployments and applications, in particular with SignalFx.
Microservices and Devs in Charge: Why Monitoring is an Analytics ProblemSignalFx
Presented at GlueCon 2015.
This presentation discusses SignalFx CTO and co-founder Phillip Liu's experience operating infrastructure and apps at massive scale and what drove the realization that monitoring is fundamentally an analytics problem now. Following on the heels of Adrian Cockroft's keynote that morning, Monitoring Microservices and Containers, this presentation went over real world examples of how modern monitoring for microservices wroks.
Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...SignalFx
Zenefits principal engineer Venkat Thiruvengadam and SignalFx engineer Maxime Petazzoni discuss operationalizing Docker at scale. Learn about the transition to a well-conceived microservices approach, the tools chosen to support these services, and the lessons learned from monitoring containers in production in a high-performance environment.
Apache Kafka lies at the heart of the largest data pipelines, handling trillions of messages and petabytes of data every day. Learn the right approach for getting the most out of Kafka from the experts at LinkedIn and Confluent. Todd Palino and Gwen Shapira demonstrate how to monitor, optimize, and troubleshoot performance of your data pipelines—from producer to consumer, development to production—as they explore some of the common problems that Kafka developers and administrators encounter when they take Apache Kafka from a proof of concept to production usage. Too often, systems are overprovisioned and underutilized and still have trouble meeting reasonable performance agreements.
Topics include:
- What latencies and throughputs you should expect from Kafka
- How to select hardware and size components
- What you should be monitoring
- Design patterns and antipatterns for client applications
- How to go about diagnosing performance bottlenecks
- Which configurations to examine and which ones to avoid
This presentation was given at the ApacheCon 2015 Kafka Meetup.
These slides go into some detail on how to tune and scale Kafka clusters and the components involved. The slides themselves are bullet points, and all the detail is in the slide notes, so please download the original presentation and review those.
Go debugging and troubleshooting tips - from real life lessons at SignalFxSignalFx
Exploring tips and advice on writing production Go systems that are easy to debug and troubleshoot. Jack Lindamood from SignalFx presents patterns that facilitate this process.
Jack addresses tools built into Go you can take advantage of, build process techniques they've learned over time, and open source tools and libraries you can use that help troubleshoot your production code when things go wrong.
Read more here: http://blog.signalfx.com/a-pattern-for-optimizing-go
Kafka at Scale: Multi-Tier ArchitecturesTodd Palino
This is a talk given at ApacheCon 2015
If data is the lifeblood of high technology, Apache Kafka is the circulatory system in use at LinkedIn. It is used for moving every type of data around between systems, and it touches virtually every server, every day. This can only be accomplished with multiple Kafka clusters, installed at several sites, and they must all work together to assure no message loss, and almost no message duplication. In this presentation, we will discuss the architectural choices behind how the clusters are deployed, and the tools and processes that have been developed to manage them. Todd Palino will also discuss some of the challenges of running Kafka at this scale, and how they are being addressed both operationally and in the Kafka development community.
Note - there are a significant amount of slide notes on each slide that goes into detail. Please make sure to check out the downloaded file to get the full content!
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Cassandra Compression and Performance EvaluationSchubert Zhang
Even though we had abandoned the Cassandra in all our products, we would like to share our works here.
Why we abandoned the Cassandra in our products? Because:
(1) It is a big wrong in Cassandra's implementation, especially on it's local storage engine layer, i.e. SSTable and Indexing.
(2) It is a big wrong to combine Bigtable and Dynamo. Dynamo's hash ring architecture is a obsolete technolohy for scale, it's consistency and replication policy is also unusable in big data storage.
How to go from waterfall app dev to secure agile development in 2 weeks Ulf Mattsson
Waterfall is based on the concept of sequential software development—from conception to ongoing maintenance—where each of the many steps flowed logically into the next.
Join this webinar presentation to learn:
- Why DevOps cannot effectively work in waterfall
- How to use DevOps tools to optimize processes in either development or operations through automation
We will also discuss what is needed to support full DevOps
Intro to GitOps with Weave GitOps, Flagger and LinkerdWeaveworks
You may not think of "GitOps" and "service mesh" together – but maybe you should! These two wildly different technologies are each enormously capable independently, and combined they deliver far more than the sum of their parts: a single Git commit can control workflows customized for your exact situation by taking advantage of the service mesh's ability to measure and manipulate traffic anywhere in your application's call graph, and you can rest easy knowing that Git is preserving the complete configuration for your entire application every step of the way.
See how these technologies can work together to tackle complex problems in cloud-native applications.
What you’ll get out of this:
* Understand what GitOps and service meshes can - and can't - do for you.
* Understand basic operations with GitOps and Linkerd.
* Understand the basics of continuous deployment with Weave GitOps and Linkerd.
apidays LIVE Paris - Serverless security: how to protect what you don't see? ...apidays
apidays LIVE Paris - Responding to the New Normal with APIs for Business, People and Society
December 8, 9 & 10, 2020
Serverless security: how to protect what you don't see?
Jean Baptiste Aviat, Co-founder and CTO at Sqreen.io
A presentation on the Netflix Cloud Architecture and NetflixOSS open source. For the All Things Open 2015 conference in Raleigh 2015/10/19. #ATO2015 #NetflixOSS
All organizations want to go faster and decrease friction in delivering software. The problem is that InfoSec has historically slowed this down or worse. But, with the rise of CD pipelines and new devsecops tooling, there is an opportunity to reverse this trend and move Security from being a blocker to being an enabler.
This talk will discuss hallmarks of doing security in a software delivery pipeline with an emphasis on being pragmatic. At each phase of the delivery pipeline, you will be armed with philosophy, questions, and tools that will get security up-to-speed with your software delivery cadence.
From DeliveryConf 2020
The objective of this project is to make servers of web service, ftp service, VoIP video call service, and manage them centrally from a host in private connection or from remote connection. We will also monitor the services, we are going to install, from this central PC. If there is a problem found like no connectivity, then the monitor agent will notify the network administrator showing the error message.
The differing ways to monitor and instrumentJonah Kowall
FullStack London July 15th, 2016
Monitoring is complicated, and in most organizations consists of far too many tools owned by many teams. These tools consist of monitoring tools each looking at a component myopically. These tools metrics and logs from devices and software emitting them. Increasingly modern companies are creating their own instrumentation, but there is a large base of generic instrumentation of software. Fixing monitoring issues requires people, process, and technology. In this talk we will cover many common issues seen in the real world. For example decisions on what should be monitored or collected from a technology and a business perspective. This requires process and coordination.
We will investigate what instrumentation is most scalable and effective across languages this includes the commonly used APIs and possibilities to capture data from common languages like Java, .NET and PHP, but we’ll also go into methods which work with Python, Node.js, and golang. We will cover browser and mobile instrumentation techniques. How these are done? which APIs are being used? What open source tools and frameworks can be leveraged? Most importantly how to coordinate and communicate requirements across your organization.
Attendees of this session will walk away with a clear understanding of:
What is instrumentation, and what do I instrument, collect, and store?
The understanding of overhead and how this can be accomplished on common software stacks?
How to work with application owners to collect business data.
How correlation works in custom open source or packaged monitoring tools.
Serverless security - how to protect what you don't see?Sqreen
Protecting serverless is a new topic. This presentation aims at showing what new security challenges it brings, and how CISO and security teams should approach it.
The serverless space evolves fast and there is no convergence on best practices yet. The switch to a serverless architecture involves several changes, for instance developers doing much more ops with serverless, deploying 20 times more services than previously...
Learn how Github analytics can help you gauge the health of your DevOps release cycle, gain visibility into team productivity, and secure your intellectual property.
Making the Shift from DevOps to Practical DevSecOps | Sumo Logic WebinarSumo Logic
In this webinar, Sumo Logic VP of Security and Compliance George Gerchow dives into how to make the shift to DevSecOps, discussing how to:
- Incorporate fundamental and high impact security best practices into your current DevOps operations
- Gain visibility into your compliance posture
- Identify potential risks and threats in your environments
How to Manage the Risk of your Polyglot EnvironmentsDevOps.com
In this webinar, we’ll explore how to navigate the tension between speed and security when it comes to open source languages.
Enterprises are challenged by conflicting interests:
Engineering teams want more time to focus on code quality, but product managers want to ship faster.
Developers want the best tool for the job, but companies resist adding more technology stacks to their growing tech debt.
Retrofitting for security and vulnerabilities after the fact becomes a big blocker for Development and Engineering teams. Enterprises are challenged with resolving new threats and vulnerabilities at the pace at which they crop up. And yet, speed wins over security because faster time-to-market takes a greater priority over fixing vulnerabilities.
Our expert panel will cover how to resolve the tension between speed and security by practices which:
Minimize DevOps overhead from retrofitting programming languages with new versions, dependencies, security patches, etc.
Enable Continuous Builds to keep up with your continuous deployments
Use Build Validation to vet your continuous builds against smoke tests
Security process should be integrated with SDLC well to be successful. While many companies have already moved from Waterfall to Agile methodologies security remains behind more often than not. We have demonstrated in our presentation how security can move to agile by utilizing open source tools, customizing them to meet our needs and to implement a continuos security testing using dynamic scanners as well as manual testing.
It’s very important also to assure that false positives are not fed to the developers bug tracking systems and to assign a severity for each finding correctly. To make it happen we import all our findings to a security dashboard and review them before exporting to a bug tracking system.
Similar to AWS Loft Talk: Behind the Scenes with SignalFx (20)
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
3. Agenda
• Background
• Overview of Key SignalFx Services
• SignalFx infrastructure and operations
• Analytics approach to monitoring
• Code push side effects, an example
• Summary
10. Microservice Complexity
More than 15 internal
services.
Services span hundreds of
instances across multiple
AZs.
Have dependencies on
tens of external services.
14. Shared Responsibility
• Engineering is organized around services they provide
• No dedicated operations team
• Each service team is responsible for building and operating
their services
• Infrastructure team provides IaaS - DNS, LB, Mail, Server,
and Network configuration and provisioning
• Ingest team provides Ingest API, Quantization, and TSDB
services
15. Continuous Build and Deployment
• Services are built and tested on each commit
• Each service deploy at their cadence
• Nearly all deployments are non-disruptive
• Push to lab, test; push product canary, test; rest of prod
• Service engineered to be resilient to partial cluster
availability
• Each service is engineered to support +1/-1 upgrades
16. On-call Rotation
• All dev on weekly on-call rotation (couple of times a year)
• On-call works on operational tools
• On-call rotates from lab -> production
• On-call is the incident manager
• Owns driving both black out and brown out incidents to
resolution
17. Operations Tools
sfhost - CLI for VM configuration and provisioning
sfc - console to access management data for all services
signalscope - deep transactions tracing
maestro - Docker orchestrator
jenkins - continuous build and deployment
18. Monitoring
• We use SignalFx to monitor SignalFx
• Engineers instrument their code as part of dev process
• Each service provides at least one dashboard
• CollectD for OS and Docker metrics on all VMs
• Yammer metrics for all Java app servers
• Custom logger to count exception types
21. Monitoring Challenges
• High iteration rate leads to shortened test cycles
• Integration test combinations are intractable
• Catch problems during rolling deployments
• Identify upstream/downstream side effects
• e.g. backpressure
• Identify brownouts before the customer
• etc.
27. Code Push Side Effects
Push canary instance and Metadata API
dashboard shows healthy tier.
28. Code Push Side Effects
However, upstream UI dashboard
showed unusual # of timeouts.
29. Code Push Side Effects
In search of root cause.
Always safe to start by looking at exception counts.
Can’t derive much from all the noise.
30. Code Push Side Effects
Sum the # of exceptions to create a single signal.
31. Code Push Side Effects
Compare sum with time-shifted sum from a day ago.
32. Code Push Side Effects
Look at an outlier host - an Analytics
service host.
33. Code Push Side Effects
java.io.InvalidObjectException: enum constant MURMUR128_MITZ_64 does
not exist in class com.google.common.hash.BloomFilterStrategies
at java.io.ObjectInputStream.readEnum(ObjectInputStream.java:1743) ~[na:
1.7.0_79]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
~[na:1.7.0_79]
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
1990) ~[na:1.7.0_79]
…
Looking at Analytic’s logs revealed
source of the problem.
34. Code Push Side Effects
• Analytics across multiple microservices reduced time
to identify problem. From push to resolution was
~15min
• Service instrumentation helped narrowed down root
cause
• Discovery allowed us to create a detector using
analytics to notify similar problems in the future
35. Other Examples
• A customer started dropping data because they
reverted to an unsupported API
• Compare TSDB write throughput of two different write
strategies
• Create per-service capacity reports
• Identify memory usage patterns across our Analytics
service
• Create a detector for every previously uncaught error
conditions - postmortem output
37. Summary
• Microservice architecture is inherently complex
• Measure all the things
• Use data analytics techniques to
• Identify problems
• Chase down root cause
• Use intelligent detectors to catch recurrence