How to achieve advanced scheduling of resources to heterogeneous tasks using the now open source NetflixOSS Fenzo scheduling library for Apache Mesos frameworks, including autoscaling of the execution cluster.
Realtime Statistics based on Apache Storm and RocketMQXin Wang
This document discusses using Apache Storm and RocketMQ for real-time statistics. It begins with an overview of the streaming ecosystem and components. It then describes challenges with stateful statistics and introduces Alien, an open-source middleware for handling stateful event counting. The document concludes with best practices for Storm performance and data hot points.
Pushing Python: Building a High Throughput, Low Latency SystemKevin Ballard
This document discusses Taba, a distributed event aggregation service. It notes that Taba can process over 10 million events per second across over 50,000 metrics and 1,000 clients using 100 processors. It then discusses four key lessons learned in building Taba: 1) getting the data model right; 2) the difficulty of centralized state; 3) how asynchronous iterators and greenlets can improve performance; and 4) how memory fragmentation is a problem in CPython and some techniques to address it like hybrid memory management and avoiding ratcheting.
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudSharma Podila
How can you reliably schedule tasks in an unreliable, autoscaling Cloud environment? In this presentation, we'll talk about the design of our scheduler built on top of Apache Mesos that serves as the core of our stream-processing platform, Mantis, designed for real-time insights. We'll focus on the following aspects of the scheduler:
- Coarse-grained vs. fine-grained resource scheduling - Fault tolerance via a combination of task reconciliation and life cycle event processing - Scheduling optimizations for bin packing, for stream locality to reduce network bandwidth usage, for task placement to achieve auto scaling of the cluster size, etc.
This talk will also include detailed information about approaches to scheduling in a distributed, auto-scaling, environment.
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsSharma Podila
This document discusses using Apache Mesos for scheduling heterogeneous resources in a cloud environment. It describes Mantis, a Mesos framework for reactive stream processing. Mantis provides lightweight jobs, dynamic scaling, and custom SLAs. Fenzo is introduced as Mantis' task scheduler, which uses plugins for constraints, fitness functions, and autoscaling. Mantis allows for stream locality, backpressure handling, and job autoscaling. The document argues that Mesos provides benefits over instance-level scheduling through finer-grained resource allocation and faster task startup times.
This document provides an overview of resource aware scheduling in Apache Storm. It discusses the challenges of scheduling Storm topologies at Yahoo scale, including increasing heterogeneous clusters, low cluster utilization, and unbalanced resource usage. It then introduces the Resource Aware Scheduler (RAS) built for Storm, which allows fine-grained resource control and isolation for topologies through APIs and cgroups. Key features of RAS include pluggable scheduling strategies, per user resource guarantees, and topology priorities. Experimental results from Yahoo Storm clusters show significant improvements to throughput and resource utilization with RAS. Future work may include improved scheduling strategies and real-time resource monitoring.
Monitoring NGINX (plus): key metrics and how-toDatadog
NGINX just works and that's why we use it. That does not mean that it should be left unmonitored. As a web server, it plays a central role in a modern infrastructure. As a gatekeeper, it sees every interaction with the application. If you monitor it properly it can explain a lot about what is happening in the rest of your infrastructure.
In this talk you will learn more about NGINX (plus) metrics, what they mean and how to use them. You will also learn different methods (status, statsd, logs) to monitor NGINX with their pros and cons, illustrated with real data coming from real servers.
This document provides an introduction to Storm, an open source distributed real-time processing system. It discusses the types of data processing in Storm as either batch or real-time. The key components of a Storm cluster are the Nimbus master node, supervisor worker nodes, and ZooKeeper coordination service. A Storm topology defines the computation as a directed acyclic graph of spouts emitting streams and bolts processing the streams.
Realtime Statistics based on Apache Storm and RocketMQXin Wang
This document discusses using Apache Storm and RocketMQ for real-time statistics. It begins with an overview of the streaming ecosystem and components. It then describes challenges with stateful statistics and introduces Alien, an open-source middleware for handling stateful event counting. The document concludes with best practices for Storm performance and data hot points.
Pushing Python: Building a High Throughput, Low Latency SystemKevin Ballard
This document discusses Taba, a distributed event aggregation service. It notes that Taba can process over 10 million events per second across over 50,000 metrics and 1,000 clients using 100 processors. It then discusses four key lessons learned in building Taba: 1) getting the data model right; 2) the difficulty of centralized state; 3) how asynchronous iterators and greenlets can improve performance; and 4) how memory fragmentation is a problem in CPython and some techniques to address it like hybrid memory management and avoiding ratcheting.
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudSharma Podila
How can you reliably schedule tasks in an unreliable, autoscaling Cloud environment? In this presentation, we'll talk about the design of our scheduler built on top of Apache Mesos that serves as the core of our stream-processing platform, Mantis, designed for real-time insights. We'll focus on the following aspects of the scheduler:
- Coarse-grained vs. fine-grained resource scheduling - Fault tolerance via a combination of task reconciliation and life cycle event processing - Scheduling optimizations for bin packing, for stream locality to reduce network bandwidth usage, for task placement to achieve auto scaling of the cluster size, etc.
This talk will also include detailed information about approaches to scheduling in a distributed, auto-scaling, environment.
Resource Scheduling using Apache Mesos in Cloud Native EnvironmentsSharma Podila
This document discusses using Apache Mesos for scheduling heterogeneous resources in a cloud environment. It describes Mantis, a Mesos framework for reactive stream processing. Mantis provides lightweight jobs, dynamic scaling, and custom SLAs. Fenzo is introduced as Mantis' task scheduler, which uses plugins for constraints, fitness functions, and autoscaling. Mantis allows for stream locality, backpressure handling, and job autoscaling. The document argues that Mesos provides benefits over instance-level scheduling through finer-grained resource allocation and faster task startup times.
This document provides an overview of resource aware scheduling in Apache Storm. It discusses the challenges of scheduling Storm topologies at Yahoo scale, including increasing heterogeneous clusters, low cluster utilization, and unbalanced resource usage. It then introduces the Resource Aware Scheduler (RAS) built for Storm, which allows fine-grained resource control and isolation for topologies through APIs and cgroups. Key features of RAS include pluggable scheduling strategies, per user resource guarantees, and topology priorities. Experimental results from Yahoo Storm clusters show significant improvements to throughput and resource utilization with RAS. Future work may include improved scheduling strategies and real-time resource monitoring.
Monitoring NGINX (plus): key metrics and how-toDatadog
NGINX just works and that's why we use it. That does not mean that it should be left unmonitored. As a web server, it plays a central role in a modern infrastructure. As a gatekeeper, it sees every interaction with the application. If you monitor it properly it can explain a lot about what is happening in the rest of your infrastructure.
In this talk you will learn more about NGINX (plus) metrics, what they mean and how to use them. You will also learn different methods (status, statsd, logs) to monitor NGINX with their pros and cons, illustrated with real data coming from real servers.
This document provides an introduction to Storm, an open source distributed real-time processing system. It discusses the types of data processing in Storm as either batch or real-time. The key components of a Storm cluster are the Nimbus master node, supervisor worker nodes, and ZooKeeper coordination service. A Storm topology defines the computation as a directed acyclic graph of spouts emitting streams and bolts processing the streams.
This document discusses using Hyperscan to match regular expressions in Rspamd. It describes how Hyperscan can match many regular expressions against the same data faster than joining expressions with pipes in PCRE. However, Hyperscan has slower compile times and does not support some PCRE features. The document proposes a "lazy hyperscan compile" approach where Rspamd initially uses PCRE alone and later switches to the pre-compiled Hyperscan database to get the speed benefits without the initial compile overhead. It shows how this approach improves Rspamd performance for both small and large messages.
Rspamd is an open-source spam filtering system written in C that uses an event-driven architecture. It processes emails using multiple rules to evaluate scores, supports plugins in Lua, and has a self-contained web interface. Performance is a key design goal through techniques like optimized regular expressions, event-driven network processing, and optimized memory usage. It uses techniques like Bayesian filtering with n-grams, fuzzy hashes, and DNS-based filtering while prioritizing security through input validation, encryption for network traffic, and limiting DNS queries.
A key feature when monitoring and debugging any Cloud infrastructure is to provide the ability to trace, track, and collate all the individual, discrete steps that compose an event. A typical resource action in OpenStack is often a combination of smaller tasks -- which given the distributed nature of OpenStack -- can fail at unpredictable points in the workflow. By collecting the appropriate events, operators can view all events within Ceilometer, filter on a failed action and trace back the history of related events to spot anomalies or errors. In this talk, we provide an overview of the recent enhancements made in Ceilometer to support the collection of event notifications from OpenStack services. We will describe: how events are processed, transformed and stored in Ceilometer; how you can derive metrics from events; and how it’s possible to track the events of a resource and analyse where errors occur.
Storm is an open source distributed real-time computation system. It provides guarantees of processing data reliably in real-time. Storm allows for building real-time streaming data pipelines that process unbounded streams of data reliably. Key features include being distributed, fault-tolerant, guaranteeing message processing, and providing a high level abstraction over message passing.
A monitoring system is arguably the most crucial system to have in place when administering and tweaking the performance of any database system. DBAs also find themselves with a variety of monitoring systems and plugins to use; ranging from small scripts in cron to complex data collection systems. In this talk, I’ll discuss how Box made a shift from the Cacti monitoring system and other various shell scripts to OpenTSDB and the changes made to our servers and daily interaction with monitoring to increase our agility in identifying and addressing changes in database behavior.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
TellApart uses the gevent library to build highly concurrent and asynchronous Python applications. Gevent allows synchronous Python code to run asynchronously by using greenlets. It monkey-patches blocking libraries so that when blocking calls occur, other greenlets can run instead of the entire process blocking. This allows TellApart to build applications like their front-end server and database proxy that can handle millions of requests per day with low latency using a single process per core.
Performance optimization 101 - Erlang Factory SF 2014lpgauth
In order to scale AdGear's real-time bidding (RTB) platform, we've been optimizing system and application performance. In this talk, I'll share all my findings, from basic tips to advanced topics. I'll cover coding patterns, metric collection, tracing and much more!
Talk objectives:
- Share basic tips, common pitfalls, efficient coding patterns
- Introduce tooling for metric collection (statsderl, vmstats, system-stats, riak_sysmon)
- Highlight erlang trace (fprof, flame graphs)
- Expose advanced topics (VM tuning, lock-counter, systemtap, cpuset)
Target audience:
- Anyone looking to improve the performance of their Erlang application.
This document provides an overview of using Prometheus for monitoring and alerting. It discusses using Node Exporters and other exporters to collect metrics, storing metrics in Prometheus, querying metrics using PromQL, and configuring alert rules and the Alertmanager for notifications. Key aspects covered include scraping configs, common exporters, data types and selectors in PromQL, operations and functions, and setting up alerts and the Alertmanager for routing alerts.
The document discusses the evolution of Ceilometer, an OpenStack project that collects measurements from deployed clouds and persists the data for later retrieval and analysis. It describes how Ceilometer has scaled out its data collection capabilities over time by adding agents, partitioning workloads, and integrating with Gnocchi to provide more efficient time-series storage. The document also provides best practices for Ceilometer deployment and configuration to optimize data collection, storage and querying.
Rspamd is a spam filtering system that is:
- Written in C for performance and uses an event-driven model to process messages asynchronously for scalability.
- Capable of detecting spam through a variety of filtering methods like policies, DNS lists, headers, text patterns, and machine learning.
- Integrates with mail transfer agents using plugins to modify or reject messages based on spam detection.
Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it’s fast — you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language. Storm was open-sourced by Twitter in September of 2011 and has since been adopted by many companies around the world.
Storm has a wide range of use cases, from stream processing to continuous computation to distributed RPC. In this talk I'll introduce Storm and show how easy it is to use for realtime computation.
Chronix as Long-Term Storage for PrometheusQAware GmbH
Cloud Native Con 2016, Seattle: Talk by Moritz Kammerer (@phxql, Senior Software Engineer at QAware).
Abstract: Prometheus is great when it comes to monitoring and alerting. But the long term storage opportunities are comparatively weak compared to related time series databases (missing data distribution, sharding etc.). At this point Chronix enters the stage. Chronix is an open source time series database. It focuses on an efficient long term storage both in terms of storage volume and access times. Chronix achieves a compression rate of 98% compared to data in CSV files while an average query took 21 milliseconds, determined in a benchmark asking 96 queries for different time ranges and time series. Chronix offers a multi-dimensional generic data model for storing all kinds of time series, functions for anomaly detection used in the frameworks EGADS and SAX, and an integration with Apache Spark allows for distributed time series processing. In this code-intense session we show the integration of Prometheus and Chronix. We also dig into the details of Chronix and explain why Chronix <3 Prometheus and vice versa. Furthermore we demonstrate a toolchain: collect data with Prometheus, pipe them to Chronix, visualize both data sources in Grafana [5], and easily analyze tons of data with Spark and Apache Zeppelin.
Bobby from Yahoo presents on running Apache Storm as a service on and off Hadoop. Storm provides low-latency data processing through streaming data flows defined by topologies of spouts and bolts. Yahoo runs Storm as a service and also maintains Spark. Bobby discusses securing standalone Storm, running Storm on YARN for security, reduced overhead and elasticity, and future work including Nimbus high availability and running Storm topologies as unmanaged applications in YARN.
The document discusses different technologies for real-time data collection and analysis, including Kafka for collecting and distributing streaming data, Storm for distributed real-time computation, and using PHP and FastCGI to parse real-time logs from Kafka in a Storm topology. It provides an overview of these technologies and their features, and proposes an architecture to collect logs with Kafka, process them with Storm and PHP, and output results through FastCGI.
The document discusses how fact-based monitoring improves on host-centric monitoring by using metadata facts instead of individual hostnames to define monitoring checks and queries. It draws parallels between how Puppet, SQL, and MCollective improved systems management, configuration, and orchestration respectively by moving from imperative programming to fact-based declarative queries. Examples are given of how Sensu and Datadog can implement fact-based monitoring checks and metrics queries that reuse existing Puppet facts to define monitoring logic instead of hardcoding hostnames. The key advantages highlighted are avoiding host groups and reducing complexity when infrastructure changes.
This document summarizes the key aspects of a public cloud archive storage solution. It offers affordable and unlimited storage using standard transfer protocols. Data is stored using erasure coding for redundancy and fault tolerance. Accessing archived data takes 10 minutes to 12 hours depending on previous access patterns, with faster access for inactive archives. The solution uses middleware to handle sealing and unsealing archives along with tracking access patterns to regulate retrieval times.
Practical container scheduling: Juggling optimizations, guarantees, and trade-offs at Netflix. A presentation at MesosCon North America, Sept 15th 2017.
This document discusses using Hyperscan to match regular expressions in Rspamd. It describes how Hyperscan can match many regular expressions against the same data faster than joining expressions with pipes in PCRE. However, Hyperscan has slower compile times and does not support some PCRE features. The document proposes a "lazy hyperscan compile" approach where Rspamd initially uses PCRE alone and later switches to the pre-compiled Hyperscan database to get the speed benefits without the initial compile overhead. It shows how this approach improves Rspamd performance for both small and large messages.
Rspamd is an open-source spam filtering system written in C that uses an event-driven architecture. It processes emails using multiple rules to evaluate scores, supports plugins in Lua, and has a self-contained web interface. Performance is a key design goal through techniques like optimized regular expressions, event-driven network processing, and optimized memory usage. It uses techniques like Bayesian filtering with n-grams, fuzzy hashes, and DNS-based filtering while prioritizing security through input validation, encryption for network traffic, and limiting DNS queries.
A key feature when monitoring and debugging any Cloud infrastructure is to provide the ability to trace, track, and collate all the individual, discrete steps that compose an event. A typical resource action in OpenStack is often a combination of smaller tasks -- which given the distributed nature of OpenStack -- can fail at unpredictable points in the workflow. By collecting the appropriate events, operators can view all events within Ceilometer, filter on a failed action and trace back the history of related events to spot anomalies or errors. In this talk, we provide an overview of the recent enhancements made in Ceilometer to support the collection of event notifications from OpenStack services. We will describe: how events are processed, transformed and stored in Ceilometer; how you can derive metrics from events; and how it’s possible to track the events of a resource and analyse where errors occur.
Storm is an open source distributed real-time computation system. It provides guarantees of processing data reliably in real-time. Storm allows for building real-time streaming data pipelines that process unbounded streams of data reliably. Key features include being distributed, fault-tolerant, guaranteeing message processing, and providing a high level abstraction over message passing.
A monitoring system is arguably the most crucial system to have in place when administering and tweaking the performance of any database system. DBAs also find themselves with a variety of monitoring systems and plugins to use; ranging from small scripts in cron to complex data collection systems. In this talk, I’ll discuss how Box made a shift from the Cacti monitoring system and other various shell scripts to OpenTSDB and the changes made to our servers and daily interaction with monitoring to increase our agility in identifying and addressing changes in database behavior.
Storm is a distributed and fault-tolerant realtime computation system. It was created at BackType/Twitter to analyze tweets, links, and users on Twitter in realtime. Storm provides scalability, reliability, and ease of programming. It uses components like Zookeeper, ØMQ, and Thrift. A Storm topology defines the flow of data between spouts that read data and bolts that process data. Storm guarantees processing of all data through its reliability APIs and guarantees no data loss even during failures.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
TellApart uses the gevent library to build highly concurrent and asynchronous Python applications. Gevent allows synchronous Python code to run asynchronously by using greenlets. It monkey-patches blocking libraries so that when blocking calls occur, other greenlets can run instead of the entire process blocking. This allows TellApart to build applications like their front-end server and database proxy that can handle millions of requests per day with low latency using a single process per core.
Performance optimization 101 - Erlang Factory SF 2014lpgauth
In order to scale AdGear's real-time bidding (RTB) platform, we've been optimizing system and application performance. In this talk, I'll share all my findings, from basic tips to advanced topics. I'll cover coding patterns, metric collection, tracing and much more!
Talk objectives:
- Share basic tips, common pitfalls, efficient coding patterns
- Introduce tooling for metric collection (statsderl, vmstats, system-stats, riak_sysmon)
- Highlight erlang trace (fprof, flame graphs)
- Expose advanced topics (VM tuning, lock-counter, systemtap, cpuset)
Target audience:
- Anyone looking to improve the performance of their Erlang application.
This document provides an overview of using Prometheus for monitoring and alerting. It discusses using Node Exporters and other exporters to collect metrics, storing metrics in Prometheus, querying metrics using PromQL, and configuring alert rules and the Alertmanager for notifications. Key aspects covered include scraping configs, common exporters, data types and selectors in PromQL, operations and functions, and setting up alerts and the Alertmanager for routing alerts.
The document discusses the evolution of Ceilometer, an OpenStack project that collects measurements from deployed clouds and persists the data for later retrieval and analysis. It describes how Ceilometer has scaled out its data collection capabilities over time by adding agents, partitioning workloads, and integrating with Gnocchi to provide more efficient time-series storage. The document also provides best practices for Ceilometer deployment and configuration to optimize data collection, storage and querying.
Rspamd is a spam filtering system that is:
- Written in C for performance and uses an event-driven model to process messages asynchronously for scalability.
- Capable of detecting spam through a variety of filtering methods like policies, DNS lists, headers, text patterns, and machine learning.
- Integrates with mail transfer agents using plugins to modify or reject messages based on spam detection.
Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it’s fast — you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language. Storm was open-sourced by Twitter in September of 2011 and has since been adopted by many companies around the world.
Storm has a wide range of use cases, from stream processing to continuous computation to distributed RPC. In this talk I'll introduce Storm and show how easy it is to use for realtime computation.
Chronix as Long-Term Storage for PrometheusQAware GmbH
Cloud Native Con 2016, Seattle: Talk by Moritz Kammerer (@phxql, Senior Software Engineer at QAware).
Abstract: Prometheus is great when it comes to monitoring and alerting. But the long term storage opportunities are comparatively weak compared to related time series databases (missing data distribution, sharding etc.). At this point Chronix enters the stage. Chronix is an open source time series database. It focuses on an efficient long term storage both in terms of storage volume and access times. Chronix achieves a compression rate of 98% compared to data in CSV files while an average query took 21 milliseconds, determined in a benchmark asking 96 queries for different time ranges and time series. Chronix offers a multi-dimensional generic data model for storing all kinds of time series, functions for anomaly detection used in the frameworks EGADS and SAX, and an integration with Apache Spark allows for distributed time series processing. In this code-intense session we show the integration of Prometheus and Chronix. We also dig into the details of Chronix and explain why Chronix <3 Prometheus and vice versa. Furthermore we demonstrate a toolchain: collect data with Prometheus, pipe them to Chronix, visualize both data sources in Grafana [5], and easily analyze tons of data with Spark and Apache Zeppelin.
Bobby from Yahoo presents on running Apache Storm as a service on and off Hadoop. Storm provides low-latency data processing through streaming data flows defined by topologies of spouts and bolts. Yahoo runs Storm as a service and also maintains Spark. Bobby discusses securing standalone Storm, running Storm on YARN for security, reduced overhead and elasticity, and future work including Nimbus high availability and running Storm topologies as unmanaged applications in YARN.
The document discusses different technologies for real-time data collection and analysis, including Kafka for collecting and distributing streaming data, Storm for distributed real-time computation, and using PHP and FastCGI to parse real-time logs from Kafka in a Storm topology. It provides an overview of these technologies and their features, and proposes an architecture to collect logs with Kafka, process them with Storm and PHP, and output results through FastCGI.
The document discusses how fact-based monitoring improves on host-centric monitoring by using metadata facts instead of individual hostnames to define monitoring checks and queries. It draws parallels between how Puppet, SQL, and MCollective improved systems management, configuration, and orchestration respectively by moving from imperative programming to fact-based declarative queries. Examples are given of how Sensu and Datadog can implement fact-based monitoring checks and metrics queries that reuse existing Puppet facts to define monitoring logic instead of hardcoding hostnames. The key advantages highlighted are avoiding host groups and reducing complexity when infrastructure changes.
This document summarizes the key aspects of a public cloud archive storage solution. It offers affordable and unlimited storage using standard transfer protocols. Data is stored using erasure coding for redundancy and fault tolerance. Accessing archived data takes 10 minutes to 12 hours depending on previous access patterns, with faster access for inactive archives. The solution uses middleware to handle sealing and unsealing archives along with tracking access patterns to regulate retrieval times.
Practical container scheduling: Juggling optimizations, guarantees, and trade-offs at Netflix. A presentation at MesosCon North America, Sept 15th 2017.
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
This document provides guidance on sizing MongoDB deployments on AWS for optimal performance. It discusses key considerations for capacity planning like testing workloads, measuring performance, and adjusting over time. Different AWS services like compute-optimized instances and storage options like EBS are reviewed. Best practices for WiredTiger like sizing cache, effects of compression and encryption, and monitoring tools are covered. The document emphasizes starting simply and scaling based on business needs and workload profiling.
Podila mesos con europe keynote aug sep 2016Sharma Podila
Sharma Podila discusses elastic resource scheduling with Apache Mesos. Mesos provides a framework for sharing computing resources between various applications. Netflix uses Mesos along with the Fenzo scheduling library to optimize resource allocation for different workloads, including batch jobs and streaming services. The scheduling aims to guarantee capacity for critical services while maximizing utilization through techniques like advance reservation and backfilling of flexible jobs. The container executor is augmented to provide per-container networking and security isolation within Mesos agents running on EC2 instances.
MongoDB replica sets allow for horizontal scaling of MongoDB deployments. The document discusses best practices for implementing and managing MongoDB replica sets, including:
- Maintaining an odd number of voting members to prevent election ties
- Using read preferences like nearest, secondary preferred for improved performance
- Configuring a minimum oplog retention period of 24 hours for recovery from outages
- Enabling authentication and authorization to secure replica sets
- Several features introduced in MongoDB versions 4.4 and 5.0 like resumable initial sync and simultaneous indexing improve replication performance.
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB
AOL experienced explosive growth and needed a new database that was both flexible and easy to deploy with little effort. They chose MongoDB. Due to the complexity of internal systems and the data, most of the migration process was spent building a new identity platform and adapters for legacy apps to talk to MongoDB. Systems were migrated in 4 phases to ensure that users were not impacted during the switch. Turning on dual reads/writes to both legacy databases and MongoDB also helped get production traffic into MongoDB during the process. Ultimately, the project was successful with the help of MongoDB support. Today, the team has 15 shards, with 60-70 GB per shard.
Evolution of MongoDB Replicaset and Its Best PracticesMydbops
There are several exciting and long-awaited features released from MongoDB 4.0. He will focus on the prime features, the kind of problem it solves, and the best practices for deploying replica sets.
The document describes Google File System (GFS), which was designed by Google to address the need for large scale data storage. GFS uses a master-chunk server architecture, where a single master manages metadata and chunk servers store file data in 64MB chunks that are replicated for fault tolerance. Key features include scalability, reliability, record appending for concurrent writes, and snapshots. The architecture is fault tolerant through replication and components like shadow masters can take over if the primary master fails.
The document summarizes testing of storage and metadata backends for a cloud synchronization framework called ClawIO. ClawIO aims to avoid synchronization between the metadata database and filesystem data, unlike most sync platforms. The authors benchmarked different metadata storage configurations including a local filesystem with MySQL, local filesystem with xattrs, and local filesystem with Redis and xattrs. Stat and upload operations using these configurations showed that storing metadata in memory databases improved performance and scalability over solely using the local filesystem. Future work includes implementing more backends like EOS, S3, and Swift and running additional benchmarks.
The document proposes using MapReduce as a general framework to support research in mining software repositories (MSR). It describes how MapReduce can provide efficiency, scalability, adaptability and flexibility for common MSR tasks like analyzing large code repositories. A case study of applying MapReduce to the J-REX MSR tool shows significant reductions in running time for large datasets. Minimal programming effort was required and MapReduce could run on various computing environments.
Netflix running Presto in the AWS CloudZhenxiao Luo
Netflix runs Presto in its AWS cloud environment to enable low-latency ad-hoc queries on petabyte-scale data stored in S3. Some key things Netflix did include optimizing Presto to read from and write directly to S3, fixing bugs, integrating Presto with its EMR and Ganglia monitoring, and deploying a 100+ node Presto cluster that handles over 1000 queries per day. Performance testing showed Presto was often 10x faster than Hive for various queries and joins. Netflix continues optimizing Presto for its needs like supporting Parquet, ODBC/JDBC drivers, and looking to address current limitations.
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
Apache Flink® Meets Apache Mesos® and DC/OSTill Rohrmann
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos.
In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
The document discusses asynchronous programming using async and await in C#. It begins by explaining what asynchronous programming is and why it is useful for improving app responsiveness and simplifying asynchronous code. It then describes how async and await works by generating state machines and using continuation tasks. The document covers some gotchas with async code as well as best practices like naming conventions. It provides references for further reading on asynchronous patterns, tasks, and unit testing asynchronous code.
SCaLE 20X: Kubernetes Cloud Cost Monitoring with OpenCost & Optimization Stra...Matt Ray
Understanding the cost and efficiency of Kubernetes on public clouds is essential once you start expanding your infrastructure with real production workloads. The CNCF Sandbox OpenCost project and specification models current and historical Kubernetes cloud spend and resource allocation by service, deployment, namespace, labels, and much more. This data provides transparency for cloud bills and can be used as the basis for optimizing your Kubernetes deployments based on cost allocation. Optimizing Kubernetes for cost and performance is an ongoing iterative process that starts with applications and works through the entire stack.
1. The document describes load testing done on an Odoo implementation to determine system limits and size requirements. Tests were run using Locust and varied the number of requests per second, users, and sessions to identify bottlenecks.
2. Testing showed the initial limit was around 500 requests/second. Further tests found that sessions stored in NFS caused issues, so they were changed to store in PostgreSQL.
3. Load balanced and monolithic architectures were compared, finding the load balanced setup using 4 servers performed similarly to a single server with 40 cores for the monolithic setup.
DockerCon EU 2015: The Glue is the Hard Part: Making a Production-Ready PaaSDocker, Inc.
PaaSTA is Yelp's open-source Docker-based PaaS that glues together common PaaS components like scheduling with Mesos/Marathon, service delivery with Docker, discovery with SmartStack, and monitoring with Sensu. The document discusses how PaaSTA provides a production-ready platform by using stable components, reducing single points of failure, enabling graceful degradation, making upgrades painless, and incorporating self-healing and alerting capabilities. Lessons learned include the importance of interfaces, maintaining the app-infra boundary, using the right abstractions, and making iterative improvements.
The Glue is the Hard Part: Making a Production-Ready PaaSEvanKrall
Docker is an amazing technology. In particular, its build-once-run-anywhere model unlocks the world of cluster schedulers like Mesos and Kubernetes. These solve many of the problems of running high-scale websites, but introduce new challenges that need addressing.
In this talk, Evan will describe PaaSTA, a PaaS built on top of open source tools including Docker, Mesos, Marathon, and Chronos. PaaSTA provides tooling for developers to quickly turn their microservice into a monitored, highly available application spanning multiple datacenters and cloud regions. Evan will give an overview of the open-source technologies that power PaaSTA, discuss how Yelp has glued these together to give developers control without burdening them with the complexities of the infrastructure, and show the workflow used by developers to update and maintain their services on PaaSTA.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Liberarsi dai framework con i Web Component.pptxMassimo Artizzu
In Italian
Presentazione sulle feature e l'utilizzo dei Web Component nell sviluppo di pagine e applicazioni web. Racconto delle ragioni storiche dell'avvento dei Web Component. Evidenziazione dei vantaggi e delle sfide poste, indicazione delle best practices, con particolare accento sulla possibilità di usare web component per facilitare la migrazione delle proprie applicazioni verso nuovi stack tecnologici.
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
29. Scheduling strategy
For each task
On each host
Validate hard constraints
Eval fitness and soft constraints
Until fitness good enough, and
A minimum #hosts evaluated
42. Rules based cluster autoscaling
● Set up rules per host attribute value
○ E.g., one autoscale rule per ASG/cluster, one cluster for network-
intensive jobs, another for CPU/memory-intensive jobs
● Sample:
#Idle
hosts
Trigger down scale
Trigger up scale
min
max
Cluster Name Min Idle Count Max Idle Count Cooldown Secs
NetworkClstr 5 15 360
ComputeClstr 10 20 300
43. Shortfall analysis based autoscaling
● Rule-based scale up has a cool down period
○ What if there’s a surge of incoming requests?
● Pending requests trigger shortfall analysis
○ Scale up happens regardless of cool down period
○ Remembers which tasks have already been covered
46. Scheduler run time (milliseconds)
Scheduler run time in milliseconds
(over a week)
Average 2 mS
Maximum 38 mS
Note: times can vary depending on # of tasks, # and types of constraints, and # of hosts
48. A bin packing experiment
Host 3
Host 2
Host 1
Mesos Slaves
Host 3000
Tasks with cpu=1
Tasks with cpu=3
Tasks with cpu=6
Fenzo
Iteratively assign
Bunch of tasks
Start with idle cluster
49. Bin packing sample results
Bin pack tasks using Fenzo’s built-in CPU bin packer
50. Bin packing sample results
Bin pack tasks using Fenzo’s built-in CPU bin packer
51. Task runtime bin packing sample
Bin pack tasks based on custom fitness calculator to pack
short vs. long run time jobs separately
52. Scheduler speed experiment
# of hosts # of tasks to
assign each
time
Avg time Avg time
per task
Min
time
Max time Total time
Hosts: 8-CPUs each
Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs
Goal: starting from an empty cluster, assign tasks to fill all hosts
Scheduling strategy: CPU bin packing
53. Scheduler speed experiment
# of hosts # of tasks to
assign each
time
Avg time Avg time
per task
Min
time
Max time Total time
1,000 1 3 mS 3 mS 1 mS 188 mS 9 s
1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s
Hosts: 8-CPUs each
Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs
Goal: starting from an empty cluster, assign tasks to fill all hosts
Scheduling strategy: CPU bin packing
54. Scheduler speed experiment
Hosts: 8-CPUs each
Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs
Goal: starting from an empty cluster, assign tasks to fill all hosts
Scheduling strategy: CPU bin packing
# of hosts # of tasks to
assign each
time
Avg time Avg time
per task
Min
time
Max time Total time
1,000 1 3 mS 3 mS 1 mS 188 mS 9 s
1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s
10,000 1 29 mS 29 mS 10 mS 240 mS 870 s
10,000 200 132 mS 0.66 mS 22 mS 434 mS 19 s
58. Fenzo: scheduling library for
frameworks
Heterogeneous
resources
Autoscaling
of cluster
Visibility of
scheduler
actions
Plugins for
Constraints, Fitness
High speed
Heterogeneous
task requests
59. Fenzo is now available in Netflix
OSS suite at
https://github.com/Netflix/Fenzo