This document summarizes a real user monitoring system at scale. The key points are:
1. The system monitors over 20 million real users, 7 million requests per minute, and 150 million daily page views across 3 data centers and 3,000 servers.
2. Data is collected from over 40 teams and 250 daily deployments using script injection and sent to 3 data centers. It is ingested into Kafka and replicated for redundancy.
3. The ingested data is processed by Storm and aggregated metrics are stored in Graphite for presentation and analysis to provide insights for over 300 specialists. Custom alerts are also used.
Atmosphere 2016 - Albert Lacki, Jaroslaw Bloch - Real user monitoring at scal...PROIDEA
A pretty detailed story of how we built a real-time user monitoring platform, gathering data of millions of users. Using the joint forces of CDN, Cloud and BigData, we created a tool for developers and product owners to guide them towards right (and data-driven) product decisions.
The document provides an overview of the internal components of ATS (Apache Traffic Server) and outlines plans for analyzing and documenting the source code. It includes a directory of chapters planned to cover various subsystems like the event system, I/O core, session handling, state machines, caching, and more. It also summarizes some of the key classes and interfaces in the event system, I/O core networking subsystem, and how network connections are handled. The author reflects on improvements that could be made and plans to support documentation in multiple languages.
The document discusses server architecture and different types of servers. It describes common server roles like web servers, application servers, and proxy servers. It provides examples of simple web servers written in Node.js and Ruby. Popular web servers mentioned include Nginx and Apache. Different hosting options for servers are also covered, such as virtual dedicated servers (VDS), virtual private servers (VPS), and cloud servers hosted on platforms like Amazon AWS, Google Cloud, and Microsoft Azure.
Data Collection & Prometheus Scraping with Sensu 2.0InfluxData
Applications are complex systems. Their many moving parts, component and dependency services, may span any number of infrastructure technologies and platforms, from bare metal to serverless. As the number of services increases, teams responsible for them will naturally develop their own preferences, such as how they instrument their code or how and when they receive alerts.
ShortBus is a system for sending and receiving events over HTTP connections that allows for long-lived persistent connections, event replay, and pluggable data formats like JSON. It includes a Perlbal plugin that runs as a proxy service, JavaScript and Perl libraries, and supports uses like live chat, stock tickers, multiplayer games, and synchronizing content across multiple user devices. Events are authorized and clients connected by the web server, which can then acknowledge receipt of events.
Liquid Stream Processing Across Web Browsers and Web ServersMasiar Babazadeh
ICWE2015 presentation on the control infrastructure of the Web Liquid Streams framework, a data stream framework developed at the University of Lugano, Switzerland.
OSMC 2018 | Monitoring with Sensu 2.0 by Sean PorterNETWAYS
Sean Porter, CTO of Sensu Inc., discussed monitoring in modern, complex infrastructures using Sensu Go. He outlined three methods for collecting monitoring data: 1) service checks using scripts and exit codes, 2) an events API for entity management and external checks, and 3) StatsD for metrics aggregation. Porter emphasized that ephemeral infrastructure is now common, requiring monitoring solutions that can scale with containers, microservices, and cloud-based infrastructure. Sensu Go provides monitoring as infrastructure scales in complexity over time.
This document outlines governor limits for synchronous and asynchronous Apex code execution. For synchronous code, limits include 100 SOQL queries issued and 10,000 records retrieved. For asynchronous code, limits are higher with 150 DML statements allowed and retrieving up to 50,000 records. The maximum heap size is 6 MB for synchronous and 12 MB for asynchronous code.
Atmosphere 2016 - Albert Lacki, Jaroslaw Bloch - Real user monitoring at scal...PROIDEA
A pretty detailed story of how we built a real-time user monitoring platform, gathering data of millions of users. Using the joint forces of CDN, Cloud and BigData, we created a tool for developers and product owners to guide them towards right (and data-driven) product decisions.
The document provides an overview of the internal components of ATS (Apache Traffic Server) and outlines plans for analyzing and documenting the source code. It includes a directory of chapters planned to cover various subsystems like the event system, I/O core, session handling, state machines, caching, and more. It also summarizes some of the key classes and interfaces in the event system, I/O core networking subsystem, and how network connections are handled. The author reflects on improvements that could be made and plans to support documentation in multiple languages.
The document discusses server architecture and different types of servers. It describes common server roles like web servers, application servers, and proxy servers. It provides examples of simple web servers written in Node.js and Ruby. Popular web servers mentioned include Nginx and Apache. Different hosting options for servers are also covered, such as virtual dedicated servers (VDS), virtual private servers (VPS), and cloud servers hosted on platforms like Amazon AWS, Google Cloud, and Microsoft Azure.
Data Collection & Prometheus Scraping with Sensu 2.0InfluxData
Applications are complex systems. Their many moving parts, component and dependency services, may span any number of infrastructure technologies and platforms, from bare metal to serverless. As the number of services increases, teams responsible for them will naturally develop their own preferences, such as how they instrument their code or how and when they receive alerts.
ShortBus is a system for sending and receiving events over HTTP connections that allows for long-lived persistent connections, event replay, and pluggable data formats like JSON. It includes a Perlbal plugin that runs as a proxy service, JavaScript and Perl libraries, and supports uses like live chat, stock tickers, multiplayer games, and synchronizing content across multiple user devices. Events are authorized and clients connected by the web server, which can then acknowledge receipt of events.
Liquid Stream Processing Across Web Browsers and Web ServersMasiar Babazadeh
ICWE2015 presentation on the control infrastructure of the Web Liquid Streams framework, a data stream framework developed at the University of Lugano, Switzerland.
OSMC 2018 | Monitoring with Sensu 2.0 by Sean PorterNETWAYS
Sean Porter, CTO of Sensu Inc., discussed monitoring in modern, complex infrastructures using Sensu Go. He outlined three methods for collecting monitoring data: 1) service checks using scripts and exit codes, 2) an events API for entity management and external checks, and 3) StatsD for metrics aggregation. Porter emphasized that ephemeral infrastructure is now common, requiring monitoring solutions that can scale with containers, microservices, and cloud-based infrastructure. Sensu Go provides monitoring as infrastructure scales in complexity over time.
This document outlines governor limits for synchronous and asynchronous Apex code execution. For synchronous code, limits include 100 SOQL queries issued and 10,000 records retrieved. For asynchronous code, limits are higher with 150 DML statements allowed and retrieving up to 50,000 records. The maximum heap size is 6 MB for synchronous and 12 MB for asynchronous code.
Building a Video Encoding Pipeline at The New York TimesFlávio Ribeiro
These slides were presented on the Streaming Media West conference in 2016. This talk is also a reference for the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Setting up the Red5 environment, building sample applications and integrating with flash. We will look at how Red5 works within the flash IDE and build a sample chat application, video streaming, and multi-user environment.
The document proposes concepts for a future city including security features, green buildings, and transportation and energy systems. It describes a centralized security checkpoint that scans biometrics and checks for dangerous items. It also details panic buttons throughout the city that alert police when pressed. The document advocates for green buildings painted with insulating paint and using solar power. It further discusses an evacuated tube transport system that could transport people at very high speeds, and uses of hydrogen fuel cells and tidal power as renewable energy sources. Underground waste and water management systems are proposed to more efficiently process and transport waste and water.
El documento presenta un dibujo a mano alzada realizado por Nicolás Steven Guerrero Arias para el Colegio Nacional Nicolás Esguerra. El dibujo fue realizado con medios artísticos como una pluma caligráfica y representa el lema del colegio "Edificamos futuro".
Karthik has over 6 years of experience as a full stack software developer with expertise in Java, JavaScript, AngularJS, NodeJS, and related frameworks. He has worked on several projects for clients in banking, analytics, and telecom developing web applications, microservices, and integrating with DevOps on cloud platforms. His responsibilities have included requirement analysis, product design, application development, project management, and defect tracking. Currently he works as a specialist at Verizon developing their workforce management tool using AngularJS, NodeJS, and a microservices architecture deployed on cloud infrastructure.
Eyyaz Ahmed is a Pakistani national currently residing in the UAE on a visit visa expiring in March 2016. He has over 7 years of experience in accounting and finance roles in Pakistan. Most recently, he worked as the Accounts and Finance Officer for Asasah Company Limited from 2007 to 2014 where he was responsible for financial reporting, audit preparation, bank reconciliation, and investor reporting. Prior to that, he served as the Branch Accountant for Asasah branches from 2005 to 2007 handling cash management and client relations duties. He holds an MBA in Finance from the Virtual University of Pakistan and a BBA in Finance from the University of Lahore.
This document contains a resume for Saumil A. Shah. The summary provides information about his career objective, which is to enhance his potentials and add value to organizations. It also lists his professional experience including roles as a Process Coordinator, Senior Representative, Line Leader, and Sales Promoter. His core competencies include inter-personal management, customer relationship management, and quality control management. Educationally, he has a B.Sc from the University of Madras and personal details such as his address and date of birth are also provided.
This document discusses several kitchen tools used in baking and food preparation:
- A spatula is a small, flat implement used to mix, spread, and lift foods, as well as other materials like paint or plaster. It derives its name from a Latin word meaning flat piece of wood.
- A pastry blender is used to cut solid fat like butter or shortening into flour to make pastries. It has narrow metal strips or wires attached to a handle.
- A rolling pin is used to shape and flatten dough, and comes in roller or rod styles made of various materials.
This document is a CompTIA certification for David Lafleur Vidrine with the identification number COMP001020804414 and an expiration date of December 23, 2020. It includes a verification code that can be used to validate the certification online at a provided URL.
1) The document discusses information risk and protection, describing how managing digital identities has become more complex with the rise of cloud and mobile technologies.
2) It promotes IBM's security solutions for managing information risk across identity, cloud, fraud, applications, data and mobile domains.
3) These solutions aim to govern users and enforce access controls, protect sensitive data, build and deploy secure applications, protect against fraud, secure mobile devices and applications, and enforce cloud security policies.
The document summarizes Sunstar Overseas Limited, a New Delhi-based company that produces and exports organic and conventional basmati rice. It has partnered with over 3,000 small-scale farmers since 2001 to produce rice through a fair trade group certification. This arrangement has improved farmers' livelihoods by providing higher prices, training, and community development projects. While still a small percentage of total production, Sunstar's focus on niche markets like fair trade and organic has benefited farmers and given the company international recognition, though it also leaves them vulnerable to price fluctuations.
This document discusses how the company DreamLab phased out their traditional motivational systems and moved to a DevOps culture with different approaches to motivation. It provides the following key points:
- DreamLab transitioned from waterfall to agile to DevOps approaches over 2011-2014, and found their previous motivational systems no longer fit the new work environment.
- Research shows incentive structures affect how teams collaborate and share information. DreamLab aimed to create an environment that motivates in a DevOps culture without financial incentives.
- Their new approach focuses on self-determination theory and intrinsic motivation factors like autonomy, mastery, purpose and relatedness, removing money as a motivator and ensuring those intrinsic factors through
Ranganathan Narayanan has over 7 years of experience as a Performance Engineer. He currently works at Swiss Re Shared Services as a Performance Engineer and Assistant Vice President. Previously, he worked at BA Continuum India and Infosys Technologies as a Performance Test Lead. He has expertise in performance testing, profiling, monitoring, and tuning applications to identify and resolve performance bottlenecks.
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
Regardless of the meaning we are searching for over our vast amounts of data, whether we are in science, finance, technology, energy, health care…, we all share the same problems that must be solved: How do we achieve that? What technologies best support the requirements? This talk is about how to leverage fast access to historical data with real time streaming data for predictive modeling for lambda architecture with Spark Streaming, Kafka, Cassandra, Akka and Scala. Efficient Stream Computation, Composable Data Pipelines, Data Locality, Cassandra data model and low latency, Kafka producers and HTTP endpoints as akka actors...
Treasure Data and AWS - Developers.io 2015N Masahiro
This document discusses Treasure Data's data architecture. It describes how Treasure Data collects and imports log data using Fluentd. The data is stored in columnar format in S3 and metadata is stored in PostgreSQL. Treasure Data uses Presto to enable fast analytics on the large datasets. The document provides details on the import process, storage, partitioning, and optimizations to improve query performance.
Supporting Enterprise System Rollouts with SplunkErin Sweeney
At Cricket Communications, Splunk started as a way to correlate all of our data into one view to help our operations team keep processes humming. Then we gave secured access to our developers, now they’re addicted. In fact, Splunk is critical in helping us speedup deployment of new systems (like our recent multi-million dollar billing system implementation). Learn how we use Splunk to display key metrics for the business, track overall system health, track transactions, optimize license usage, and support capacity
planning.
Stream data processing is increasingly required to support business needs for faster actionable insight with growing volume of information from more sources. Apache Apex is a true stream processing framework for low-latency, high-throughput and reliable processing of complex analytics pipelines on clusters. Apex is designed for quick time-to-production, and is used in production by large companies for real-time and batch processing at scale.
This session will use an Apex production use case to walk through the incremental transition from a batch pipeline with hours of latency to an end-to-end streaming architecture with billions of events per day which are processed to deliver real-time analytical reports. The example is representative for many similar extract-transform-load (ETL) use cases with other data sets that can use a common library of building blocks. The transform (or analytics) piece of such pipelines varies in complexity and often involves business logic specific, custom components.
Topics include:
* Pipeline functionality from event source through queryable state for real-time insights.
* API for application development and development process.
* Library of building blocks including connectors for sources and sinks such as Kafka, JMS, Cassandra, HBase, JDBC and how they enable end-to-end exactly-once results.
* Stateful processing with event time windowing.
* Fault tolerance with exactly-once result semantics, checkpointing, incremental recovery
* Scalability and low-latency, high-throughput processing with advanced engine features for auto-scaling, dynamic changes, compute locality.
* Who is using Apex in production, and roadmap.
Following the session attendees will have a high level understanding of Apex and how it can be applied to use cases at their own organizations.
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward
eal-time Processing with Flink for Machine Learning at Netflix
Machine learning plays a critical role in providing a great Netflix member experience. It is used to drive many parts of the site including video recommendations, search results ranking, and selection of artwork images. Providing high-fidelity, near real-time data is increasingly important for these machine learning pipelines, especially as multi-armed bandit and reinforcement learning techniques, in addition to more ""traditional"" supervised learning, become more prevalent. With access to this data, models are able to converge more quickly, features can be updated more frequently, and analysis can be done in a more timely manner.
In this talk, we will focus on the practical details of leveraging Flink to process trillions of events per day, work with the time dimension, and manage large and frequently-changing state. We will discuss different processing schemes and dataflows, scalability and resiliency challenges we tackled, operational considerations, and instrumentation we added for monitoring job health in production.
Fitur AppManager - Application Manager ManageEngineFanky Christian
ManageEngine Applications Manager is a solution that monitors IT infrastructure including applications, servers, databases, operating systems, and network services across physical, virtual, and cloud environments. It provides unified monitoring of business infrastructure and diverse applications. Key functions include agentless monitoring of heterogeneous infrastructure, alerts and root cause analysis, reporting, and support for ITIL processes like SLA management. It is available in various product editions for small/medium/enterprise users.
Building a Video Encoding Pipeline at The New York TimesFlávio Ribeiro
These slides were presented on the Streaming Media West conference in 2016. This talk is also a reference for the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Setting up the Red5 environment, building sample applications and integrating with flash. We will look at how Red5 works within the flash IDE and build a sample chat application, video streaming, and multi-user environment.
The document proposes concepts for a future city including security features, green buildings, and transportation and energy systems. It describes a centralized security checkpoint that scans biometrics and checks for dangerous items. It also details panic buttons throughout the city that alert police when pressed. The document advocates for green buildings painted with insulating paint and using solar power. It further discusses an evacuated tube transport system that could transport people at very high speeds, and uses of hydrogen fuel cells and tidal power as renewable energy sources. Underground waste and water management systems are proposed to more efficiently process and transport waste and water.
El documento presenta un dibujo a mano alzada realizado por Nicolás Steven Guerrero Arias para el Colegio Nacional Nicolás Esguerra. El dibujo fue realizado con medios artísticos como una pluma caligráfica y representa el lema del colegio "Edificamos futuro".
Karthik has over 6 years of experience as a full stack software developer with expertise in Java, JavaScript, AngularJS, NodeJS, and related frameworks. He has worked on several projects for clients in banking, analytics, and telecom developing web applications, microservices, and integrating with DevOps on cloud platforms. His responsibilities have included requirement analysis, product design, application development, project management, and defect tracking. Currently he works as a specialist at Verizon developing their workforce management tool using AngularJS, NodeJS, and a microservices architecture deployed on cloud infrastructure.
Eyyaz Ahmed is a Pakistani national currently residing in the UAE on a visit visa expiring in March 2016. He has over 7 years of experience in accounting and finance roles in Pakistan. Most recently, he worked as the Accounts and Finance Officer for Asasah Company Limited from 2007 to 2014 where he was responsible for financial reporting, audit preparation, bank reconciliation, and investor reporting. Prior to that, he served as the Branch Accountant for Asasah branches from 2005 to 2007 handling cash management and client relations duties. He holds an MBA in Finance from the Virtual University of Pakistan and a BBA in Finance from the University of Lahore.
This document contains a resume for Saumil A. Shah. The summary provides information about his career objective, which is to enhance his potentials and add value to organizations. It also lists his professional experience including roles as a Process Coordinator, Senior Representative, Line Leader, and Sales Promoter. His core competencies include inter-personal management, customer relationship management, and quality control management. Educationally, he has a B.Sc from the University of Madras and personal details such as his address and date of birth are also provided.
This document discusses several kitchen tools used in baking and food preparation:
- A spatula is a small, flat implement used to mix, spread, and lift foods, as well as other materials like paint or plaster. It derives its name from a Latin word meaning flat piece of wood.
- A pastry blender is used to cut solid fat like butter or shortening into flour to make pastries. It has narrow metal strips or wires attached to a handle.
- A rolling pin is used to shape and flatten dough, and comes in roller or rod styles made of various materials.
This document is a CompTIA certification for David Lafleur Vidrine with the identification number COMP001020804414 and an expiration date of December 23, 2020. It includes a verification code that can be used to validate the certification online at a provided URL.
1) The document discusses information risk and protection, describing how managing digital identities has become more complex with the rise of cloud and mobile technologies.
2) It promotes IBM's security solutions for managing information risk across identity, cloud, fraud, applications, data and mobile domains.
3) These solutions aim to govern users and enforce access controls, protect sensitive data, build and deploy secure applications, protect against fraud, secure mobile devices and applications, and enforce cloud security policies.
The document summarizes Sunstar Overseas Limited, a New Delhi-based company that produces and exports organic and conventional basmati rice. It has partnered with over 3,000 small-scale farmers since 2001 to produce rice through a fair trade group certification. This arrangement has improved farmers' livelihoods by providing higher prices, training, and community development projects. While still a small percentage of total production, Sunstar's focus on niche markets like fair trade and organic has benefited farmers and given the company international recognition, though it also leaves them vulnerable to price fluctuations.
This document discusses how the company DreamLab phased out their traditional motivational systems and moved to a DevOps culture with different approaches to motivation. It provides the following key points:
- DreamLab transitioned from waterfall to agile to DevOps approaches over 2011-2014, and found their previous motivational systems no longer fit the new work environment.
- Research shows incentive structures affect how teams collaborate and share information. DreamLab aimed to create an environment that motivates in a DevOps culture without financial incentives.
- Their new approach focuses on self-determination theory and intrinsic motivation factors like autonomy, mastery, purpose and relatedness, removing money as a motivator and ensuring those intrinsic factors through
Ranganathan Narayanan has over 7 years of experience as a Performance Engineer. He currently works at Swiss Re Shared Services as a Performance Engineer and Assistant Vice President. Previously, he worked at BA Continuum India and Infosys Technologies as a Performance Test Lead. He has expertise in performance testing, profiling, monitoring, and tuning applications to identify and resolve performance bottlenecks.
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
Regardless of the meaning we are searching for over our vast amounts of data, whether we are in science, finance, technology, energy, health care…, we all share the same problems that must be solved: How do we achieve that? What technologies best support the requirements? This talk is about how to leverage fast access to historical data with real time streaming data for predictive modeling for lambda architecture with Spark Streaming, Kafka, Cassandra, Akka and Scala. Efficient Stream Computation, Composable Data Pipelines, Data Locality, Cassandra data model and low latency, Kafka producers and HTTP endpoints as akka actors...
Treasure Data and AWS - Developers.io 2015N Masahiro
This document discusses Treasure Data's data architecture. It describes how Treasure Data collects and imports log data using Fluentd. The data is stored in columnar format in S3 and metadata is stored in PostgreSQL. Treasure Data uses Presto to enable fast analytics on the large datasets. The document provides details on the import process, storage, partitioning, and optimizations to improve query performance.
Supporting Enterprise System Rollouts with SplunkErin Sweeney
At Cricket Communications, Splunk started as a way to correlate all of our data into one view to help our operations team keep processes humming. Then we gave secured access to our developers, now they’re addicted. In fact, Splunk is critical in helping us speedup deployment of new systems (like our recent multi-million dollar billing system implementation). Learn how we use Splunk to display key metrics for the business, track overall system health, track transactions, optimize license usage, and support capacity
planning.
Stream data processing is increasingly required to support business needs for faster actionable insight with growing volume of information from more sources. Apache Apex is a true stream processing framework for low-latency, high-throughput and reliable processing of complex analytics pipelines on clusters. Apex is designed for quick time-to-production, and is used in production by large companies for real-time and batch processing at scale.
This session will use an Apex production use case to walk through the incremental transition from a batch pipeline with hours of latency to an end-to-end streaming architecture with billions of events per day which are processed to deliver real-time analytical reports. The example is representative for many similar extract-transform-load (ETL) use cases with other data sets that can use a common library of building blocks. The transform (or analytics) piece of such pipelines varies in complexity and often involves business logic specific, custom components.
Topics include:
* Pipeline functionality from event source through queryable state for real-time insights.
* API for application development and development process.
* Library of building blocks including connectors for sources and sinks such as Kafka, JMS, Cassandra, HBase, JDBC and how they enable end-to-end exactly-once results.
* Stateful processing with event time windowing.
* Fault tolerance with exactly-once result semantics, checkpointing, incremental recovery
* Scalability and low-latency, high-throughput processing with advanced engine features for auto-scaling, dynamic changes, compute locality.
* Who is using Apex in production, and roadmap.
Following the session attendees will have a high level understanding of Apex and how it can be applied to use cases at their own organizations.
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward
eal-time Processing with Flink for Machine Learning at Netflix
Machine learning plays a critical role in providing a great Netflix member experience. It is used to drive many parts of the site including video recommendations, search results ranking, and selection of artwork images. Providing high-fidelity, near real-time data is increasingly important for these machine learning pipelines, especially as multi-armed bandit and reinforcement learning techniques, in addition to more ""traditional"" supervised learning, become more prevalent. With access to this data, models are able to converge more quickly, features can be updated more frequently, and analysis can be done in a more timely manner.
In this talk, we will focus on the practical details of leveraging Flink to process trillions of events per day, work with the time dimension, and manage large and frequently-changing state. We will discuss different processing schemes and dataflows, scalability and resiliency challenges we tackled, operational considerations, and instrumentation we added for monitoring job health in production.
Fitur AppManager - Application Manager ManageEngineFanky Christian
ManageEngine Applications Manager is a solution that monitors IT infrastructure including applications, servers, databases, operating systems, and network services across physical, virtual, and cloud environments. It provides unified monitoring of business infrastructure and diverse applications. Key functions include agentless monitoring of heterogeneous infrastructure, alerts and root cause analysis, reporting, and support for ITIL processes like SLA management. It is available in various product editions for small/medium/enterprise users.
Digdag can automate large-scale data processing and handle errors. It provides constructs like operators, parameters, and task groups to organize workflows. Operators package tasks to run queries or process data. Parameters allow passing variables between tasks. Task groups modularize and organize workflows. Digdag supports error handling, monitoring, parallelization, versioning, and reproducing workflows across environments.
5 Years of Progress in Active Data WarehousingTeradata
Teradata's Dan Graham, , presentation from the 2010 Teradata User Group meetings on Active Data Warehousing over the last five years.
For more information on Active Data Warehousing, please visit Teradata.com
This document discusses monitoring systems using syslog and EventMachine. It proposes building a lightweight, polyglot system that aggregates syslog events and displays metrics and visualizations using various protocols like WebSockets, Server-Sent Events, and Graphite. Event sources would send syslog messages which an EventMachine server would parse and pass to an EM:Channel. A JavaScript client could subscribe to the channel for real-time updates.
In the world of big data we need to build services that will be able to collect massive data, save it and pass it to processing and analysis. However, building manageable, reliable services that are scalable and cost effective is not an easy task. The choice of eco-system, frameworks and programming language, as well as using solid engineering principles is also crucial for achieving this goal.
I will share our journey and insights from rebuilding a cloud service in Linux eco-system using Scala, Akka Actors and Aerospike DB, at the end of which we gained 10 folds improvement of server usage with a much lighter, stable and reliable system that handles tens of millions of requests per hour.
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
This document discusses transitioning from batch to streaming data processing using Apache Apex. It provides an overview of Apex and how it can be used to build real-time streaming applications. Examples are given of how to build an application that processes Twitter data streams and visualizes results. The document also outlines Apex's capabilities for scalable stream processing, queryable state, and its growing library of connectors and transformations.
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017Thomas Weise
https://berlinbuzzwords.de/17/session/batch-streaming-etl-apache-apex
Stream data processing is increasingly required to support business needs for faster actionable insight with growing volume of information from more sources. Apache Apex is a true stream processing framework for low-latency, high-throughput and reliable processing of complex analytics pipelines on clusters. Apex is designed for quick time-to-production, and is used in production by large companies for real-time and batch processing at scale.
This session will use an Apex production use case to walk through the incremental transition from a batch pipeline with hours of latency to an end-to-end streaming architecture with billions of events per day which are processed to deliver real-time analytical reports. The example is representative for many similar extract-transform-load (ETL) use cases with other data sets that can use a common library of building blocks. The transform (or analytics) piece of such pipelines varies in complexity and often involves business logic specific, custom components.
Topics include:
Pipeline functionality from event source through queryable state for real-time insights.
API for application development and development process.
Library of building blocks including connectors for sources and sinks such as Kafka, JMS, Cassandra, HBase, JDBC and how they enable end-to-end exactly-once results.
Stateful processing with event time windowing.
Fault tolerance with exactly-once result semantics, checkpointing, incremental recovery
Scalability and low-latency, high-throughput processing with advanced engine features for auto-scaling, dynamic changes, compute locality.
Recent project development and roadmap.
Following the session attendees will have a high level understanding of Apex and how it can be applied to use cases at their own organizations.
GTS Episode 1: Reactive programming in the wildOmer Iqbal
You've probably heard of Reactive Programming. It’s a fashionable, new paradigm for modelling asynchronous events in a composable, declarative way. Our mobile developers decided to be trendy, and adopted the world of Observables (aka Signals, Streams) in production. It definitely helped them wade through complex async flows. As code became declarative, it became easier to reason about expected behaviour. Bugs began to flee. Our developers began finding spare time to gym and watch Game of Thrones. Life was good. Or was it?
As with any technology in production, they discovered a host of interesting issues that the tutorials and toy projects never mentioned...
This talk covers the fundamentals of Reactive Programming on iOS and Android. It also includes a much needed debate on issues with the paradigm in production.
Billions of Rows, Millions of Insights, Right NowRob Winters
Presentation from Tableau Customer Conference 2013 on building a real time reporting/analytics platform. Topics discussed include definitions of big data and real time, technology choices and rationale, use cases for real time big data, architecture, and pitfalls to avoid.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
Easy Taxi está presente em mais de 30 países e tem milhões de usuários, entre passageiros e taxistas. Seu aplicativo roda em dezenas de plataformas móveis e suporta milhares de acessos simultâneos. A aplicação nasceu na nuvem da AWS e faz pleno uso de todos os seus recursos. Nesta apresentação avançada, exploramos a arquitetura da Easy Taxi e analisamos as estratégias de otimização disponíveis para os aplicativos implementados na nuvem AWS.
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
SQL is the lingua franca for querying and processing data. To this day, it provides non-programmers with a powerful tool for analyzing and manipulating data. But with the emergence of stream processing as a core technology for data infrastructures, can you still use SQL and bring real-time data analysis to a broader audience?
The answer is yes, you can. SQL fits into the streaming world very well and forms an intuitive and powerful abstraction for streaming analytics. More importantly, you can use SQL as an abstraction to unify batch and streaming data processing. Viewing streams as dynamic tables, you can obtain consistent results from SQL evaluated over static tables and streams alike and use SQL to build materialized views as a data integration tool.
Fabian Hueske and Shuyi Chen explore SQL’s role in the world of streaming data and its implementation in Apache Flink and cover fundamental concepts, such as streaming semantics, event time, and incremental results. They also share their experience using Flink SQL in production at Uber, explaining how Uber leverages Flink SQL to solve its unique business challenges and how the unified stream and batch processing platform enables both technical or nontechnical users to process real-time and batch data reliably using the same SQL at Uber scale.
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
This document summarizes Plazma, Treasure Data's distributed analytical database that can import 40 billion records per day. It discusses how Plazma reliably imports and processes large volumes of data through its scalable architecture with real-time and archive storage. Data is imported using Fluentd and processed using its column-oriented, schema-on-read design to enable fast queries. The document also covers Plazma's transaction API and how it is optimized for metadata operations.
Similar to Real User Monitoring at Scale @ Atmosphere Conference 2016 (20)
DreamLab Academy #12 Wprowadzenie do React.jsDreamLab
Prezentacja pokazana na warsztatach DreamLab Academy #12 w Krakowie. Na spotkaniu staraliśmy się wprowadzić uczestników w świat Reacta. Opowiedzieliśmy o możliwościach jakie daje nam to narzędzie i wspólnie stworzymy prostą aplikację internetową.
DreamLab Academy to otwarte i darmowe warsztaty z programowania gdzie specjaliści z DreamLabu dzielą się tym co potrafią.
Selenium WebDriver Testy Automatyczne w Pythonie | DreamLab Academy #8DreamLab
Tworzenie i wykorzystaine testów automatycznych w Pythonie przy użyciu Selenium WebDriver. Prezentacja przedstawiona na warsztatach w ramach cyklu wartsztatów DreamLab Academy. W razie pytań zapraszamy do kontaktu academy@dreamlab.pl
The presentation was created for a DreamLab Academy workshop Automated testsing in Python. For more details get in touch - academy@dreamlab.pl
Podstawy wykorzystania biblioteki React.js. Prezentacja przedstawiona na warsztatach dla studentów Politechniki Krakowskiej w ramach cyklu wartsztatów DreamLab Academy. W razie pytań zapraszamy do kontaktu academy@dreamlab.pl
The presentation was created for a DreamLab Academy workshop on React.js. For more details get in touch - academy@dreamlab.pl
Prezentacja pokazana podczas AllegroTech meetup w listopadzie 2017 przez Eryka Zimończyka i Sebastiana Szczerbowskiego.
Zasady znacie – lekko, szybko i coraz szybciej. Ale im więcej produktów, stron i portali trzeba zoptymalizować, tym bardziej karkołomne wydaje się to zadanie. Jak lokalizować zagrożenia i błędy? Jak porównywać się z innymi żeby wyniki miały przełożenie na rzeczywistość? I wreszcie jak stworzyć proces, który naprawdę pozwoli na optymalizację wszystkich produktów na raz. W naszej prezentacji pokazaliśmy nieprzypudrowaną twarz optymalizacji stron internetowych dla największych portali medialnych w Polsce.
Podstawy języka JavaScript. Prezentacja przedstawiona na warsztatach z wprowadzenia do JavaScriptu w ramach DreamLab Academy. W razie pytań zapraszamy do kontaktu academy@dreamlab.pl
The presentation was created for a DreamLab Academy workshop on JavaScript.For more details get in touch - academy@dreamlab.pl
Let's build a PaaS platform, how hard could it be?DreamLab
Presentation given by Błażej Kasperczyk at Pykonik meetup in Kraków.
How many applications, and where do we put them? Why is our system so bad at keeping up with what the users want? What to do in case of a noisy neighbour?
When you're aiming to provide a platform where the developers could easily launch an application without worrying about configuring the system, you will have to code it sooner or later. As with most very simple concepts, it presents a plethora of challenges to deal with.
Wdrażanie na wulkanie, czyli CI w świecie który nie znosi opóźnień.DreamLab
Prezentacja wygłoszona przez Piotra Marczydło podczas konferencji Quality Excites w Katowicach.
Termin „Continuous Integration” robi karierę. Wszyscy w IT używają go tak często, że powoli staje się pustym hasłem. W zderzeniu z praktyką jest jednak różnie, bo do gry włączają się przyzwyczajenia, drogi na skróty kuszą. Dzisiaj każdy może w mgnieniu oka postawić VM i całą infrastrukturę, ale jak przy tym zachować najwyższą jakość? Podczas prelekcji Piotr przybliżył rozwiązanie problemu w prywatnej chmurze DreamLabu przy pomocy Atlassiana. Opowiedział, w jaki sposób jego zespół przeprowadza testy, analizuje kod i monitoruje wdrożenia serwisów, z których korzysta 23 miliony użytkowników w Europie.
Gdy testy to za mało - Continuous MonitoringDreamLab
Prezentacja przedstawiona przez Piotra Marczydło na Quality Meetup #13.
"Gdy mówimy o naszej pracy, myślimy głównie o testowaniu. Jednak wraz z rozwojem produktu, testy przestają wystarczać. Żeby odpowiadać na potrzeby naszych klientów i rozwiązywać ich problemy musimy iść krok dalej. Podejście znane jako Continuos Monitoring to niezastąpiona pomoc w tropieniu problemów w aplikacjach. W prezentacji opowiadam jak w moim Zespole Quality Assurance zorganizowaliśmy monitoring – co mierzymy, w jaki sposób to robimy i do czego wykorzystujemy zdobytą wiedzę. Przedstawiam też kilka krytycznych błędów, które udało się nam wykryć właśnie dzięki stałemu monitorowaniu serwisów.
Intro to JavaScript | Wstęp do programowania w Java Script | DreamLab Academy #4DreamLab
Wstęp do programowania w JavaScript. Prezentacja przedstawiona na warsztatach z podstaw języka JavaScript w ramach DreamLab Academy. W razie pytań zapraszamy do kontaktu academy@dreamlab.pl
The presentation has been part of DreamLab Academy workshop on Redux. For more details get in touch - academy@dreamlab.pl
Redux - idealne dopełnienie Reacta. Prezentacja przedstawiona na warsztatach z podstaw technologii Redux w ramach DreamLab Academy.
W razie pytań zapraszamy do kontaktu academy@dreamlab.pl
The presentation has been part of DreamLab Academy workshop on Redux. For more details get in touch - academy@dreamlab.pl
Quick start with React | DreamLab Academy #2DreamLab
Szybki start z React. Prezentacja przedstawiona na warsztatach z podstaw technologii React w ramach DreamLab Academy.
W razie pytań zapraszamy do kontaktu academy@dreamlab.pl
Continuous Integration w konfiguracji urządzeń sieciowychDreamLab
Konfigurowanie urządzeń sieciowych można traktować jak pisanie kodu. To co programiści znają jako continous integration może być wykorzystane również w zarządzaniu siecią, nawet tak dużą jak w Grupie Onet-RAS Polska. Na PLNOG Piotr Pieprzycki przedstawilł model w jakim wprowadzamy w DreamLabie zmiany w naszym środowisku i z jakimi problemami zetknęliśmy się po drodze.
About the idea of DevOps, why we implemented DevOps and what we did, what is important !
About our road from waterfall/ITIL and silo structures to DevOps/Agile culture.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Instagram has become one of the most popular social media platforms, allowing people to share photos, videos, and stories with their followers. Sometimes, though, you might want to view someone's story without them knowing.
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfFlorence Consulting
Quattordicesimo Meetup di Milano, tenutosi a Milano il 23 Maggio 2024 dalle ore 17:00 alle ore 18:30 in presenza e da remoto.
Abbiamo parlato di come Axpo Italia S.p.A. ha ridotto il technical debt migrando le proprie APIs da Mule 3.9 a Mule 4.4 passando anche da on-premises a CloudHub 1.0.
Understanding User Behavior with Google Analytics.pdfSEO Article Boost
Unlocking the full potential of Google Analytics is crucial for understanding and optimizing your website’s performance. This guide dives deep into the essential aspects of Google Analytics, from analyzing traffic sources to understanding user demographics and tracking user engagement.
Traffic Sources Analysis:
Discover where your website traffic originates. By examining the Acquisition section, you can identify whether visitors come from organic search, paid campaigns, direct visits, social media, or referral links. This knowledge helps in refining marketing strategies and optimizing resource allocation.
User Demographics Insights:
Gain a comprehensive view of your audience by exploring demographic data in the Audience section. Understand age, gender, and interests to tailor your marketing strategies effectively. Leverage this information to create personalized content and improve user engagement and conversion rates.
Tracking User Engagement:
Learn how to measure user interaction with your site through key metrics like bounce rate, average session duration, and pages per session. Enhance user experience by analyzing engagement metrics and implementing strategies to keep visitors engaged.
Conversion Rate Optimization:
Understand the importance of conversion rates and how to track them using Google Analytics. Set up Goals, analyze conversion funnels, segment your audience, and employ A/B testing to optimize your website for higher conversions. Utilize ecommerce tracking and multi-channel funnels for a detailed view of your sales performance and marketing channel contributions.
Custom Reports and Dashboards:
Create custom reports and dashboards to visualize and interpret data relevant to your business goals. Use advanced filters, segments, and visualization options to gain deeper insights. Incorporate custom dimensions and metrics for tailored data analysis. Integrate external data sources to enrich your analytics and make well-informed decisions.
This guide is designed to help you harness the power of Google Analytics for making data-driven decisions that enhance website performance and achieve your digital marketing objectives. Whether you are looking to improve SEO, refine your social media strategy, or boost conversion rates, understanding and utilizing Google Analytics is essential for your success.
opowiemy o tym jak w DreamLabie robimy Real User Monitoring dla milionów użytkowników.
czym się na co dzień zajmujemy i z jakimi wyzwaniami się mierzymy.
DL - spółka IT - część grupy RAS
Spółka medialna obecna w kilku krajach Europy
RAS dostarcza treści przez różne kanały - print, internet - Dla nas oczywiście istotne jest to ostatnie
Tworzymy serwisy, aplikacje mobilne, platformy IT
Informacyjne: Onet, Fakt, Newsweek,
Ecommerce: Skąpiec, Opineo,
Społecznościowe: Nasza-Klasa, Sympatia
Oraz wiele innych oraz ich odpowiedników w pozostałych krajach
Na co to się przekłada?
Przede wszystkim na gigantyczną liczbę użytkowników
Którzy generują ogromną liczbę zapytań
To wszystko podawane z 3 serwerowni
Ogromny ruch sieciowy
Masa sprzętu wykorzystywanego aby sobie z nim poradzić
Za tym wszystkim stoją ludzie
40 zespołów w 3 biurach – w Krakowie, Wrocławiu i Warszawie
Ponad 300 specjalistów
Jesteśmy agile, pracujemy w kulturze DevOps, podchodzimy do zmian iteracyjnie
Więc mamy ponad 250 wdrożeń dziennie
Jak w tak różnorodnym środowisku radzimy sobie z jakością?
Na jakość składa się wiele czynników – zostańmy przy jednym z nich - szybkości
Zastanówmy się czy użytkownicy w ogóle zwracają na takie rzeczy uwagę?
Czy jest sens się tym w ogóle zajmować?
Nie tylko my zadawaliśmy sobie takie pytanie
Niewielki wzrost czasu ma realny wpływ na biznes
Jeżeli strona ładuje się wolniej to ludzie odejdą do konkurencji
Podobnie z innymi usługami – np. wyniki wyszukiwania
Nawet niewielka zmiana może powodować duże konsekwencje
Musimy wiedzieć, czyli mierzyć
Potrzebujemy ciągłego monitoringu działania usług
No to od czego zacząć?
Na działanie usługi ma wpływ masa różnych czynników
Zaczynając od obciążenia serwerów bazodanowych
A kończąc na takich szczegółach jak temperatura pojedynczego dysku w macierzy
Każdy z nas to monitoruje
Ale czy to wystarczy
Źle działająca baza danych może spowodować problemy u użytkownika
Suma dobrze działających komponentów nie musi być dobrze działającą usługą
Wyjdźmy więc poziom wyżej
Potraktujmy usługę jako całość
Spróbujmy wpiąć się w miejscu w którym łączy się do nas klient
moniotrujemy na ostatnim elemencie infrastruktury przed użytkownikiem
Wydawałoby się że to wystarczy - przecież wpinamy się w ostatnim punkcie naszej architektury!
Za co tak naprawdę ocenia nas użytkownik?
Typowy Janusz internetu przeglądający profile na sympatii
Interesuje go to jak serwis działa w jego przeglądarce
Nie jak szybko odpowiada serwer HTTP
Zamknięcie się tylko wewnątrz naszej infrastruktury
Monitorowanie tego co mamy pod kontrolą to za mało, no to co z tym zrobić?
Wyjdźmy z monitoringiem poza naszą infrastrukturę
Są gotowe usługi które takie podejście realizują
również z nich korzystamy
zazwyczaj są to automaty udające przeglądarki
dostarczają próbniki w wielu lokalizacjach
Czy to faktycznie oddaje user experience?
Patrzcie na to zdjęcie za nami
To są nasi użytkownicy
Każdy z nich jest inny
Jest ich 20 milionów
Na dodatek korzystają z wielu różnych urządzeń
6 z rzędu rok mobile ;-)
Nawet lodówki!
Wiele przeglądarek – w różnych wersjach
Dochodzi wymiar geograficzny
RS - praktycznie tylko mobile, czyli duże opóźnienia
HU – bardzo dobra sieć szkieletowa
Nie jesteśmy w stanie przewidzieć i sprawdzić wszystkich możliwości
Nie damy rady kupić tylu próbników bo będzie to po prostu za drogie
A chcemy być pewni że wszyscy użytkownicy otrzymują usługę najwyżej jakości
Użyjmy naszych użytkowników jako próbnika
Niech raportuje sam do nas informacje o tym jak działa u niego usługa
I to jest właśnie real user monitoring
Co więcej w ten sposób mierzymy user experience a nie jakiś techniczny parametr
Tylko jak to zrobić w praktyce?
Zaczęliśmy szukać gotowych rozwiązań na rynku – są
Albo nie wytrzymają ruchu
Albo dane które dostarczają są dostępne z pewnym opóźnieniem
Co więcej zamykamy się jedynie na metryki które dostarcza usługa
Nikt na rynku nie daje możliwości przetwarzania żywych danych wg. własnych potrzeb
Czyli musimy zrobić własne rozwiązanie, pytanie tylko od czego zacząć?
Jak je zaprojektować?
Podzieliśmy sobie ten proces na części składowe
Potrzebujemy dane zbierać
Dostarczyć do naszej infrastruktury
Przetworzyć
Zaprezentować w celu wyciągnięcia wniosków
No to zacznijmy od początku
Jak zebrać dane od użytkownika?
Użytkownicy łączą się do naszych usług za pomocą przeglądarek
Spróbujmy wykorzystać przeglądarki użytkowników
W jaki sposób?
na stronach umieszczamy skrypt zbierający dane
Pytanie tylko co da się wyciągnąć z przeglądarki?
Navigation Timing API
Cały cykl ładowania strony - wszystkie informacje o połączeniu, DNS, ….
Informacje o renderowaniu strony
Zdarzenia - domInteractive, domComplete
Jak mierzyć RTT za pomocą Navigation Timing API
Oczywiście na proces ładowania strony mają wpływ też zasoby – css, js, obrazki, reklamy
Do tego też jest API
Daje identyczne informacje jak Navigation Timing API tyle że dla każdego z zasobów osobno
Pamiętajcie tylko o tym, że różnie przeglądarki traktują zasoby ładowane z innych domen
Trzeba zapewnić że będą odpowiednie nagłówki
My jesteśmy w tej dobrej pozycji że podajemy wszystko z własnej infrastruktury więc możemy to zapewnić
No ale czasy to nie wszystko
Warto zbierać również wszystkie możliwe do przechwycenia wyjątki
Na tym etapie wyniki są nadal u użytkownika
Potrzebujemy je dostarczyć do naszej infrastruktury aby je dalej przetwarzać
Nie jest to takie proste, bo tych danych jest bardzo dużo
3 mln zdarzeń na minutę
Co przekłada się na wiele GB danych do przesłania
Zaczęliśmy szukać usługi która może sobie z tym poradzić
W naszym przypadku jest to nasz serwer brzegowy – Accelerator
Jest to usługa zbudowana w modelu CDN
Ulokowana zarówno w naszej infrastrukturze jak i u zewnętrznych operatorów
Przystosowana do podawania wszystkich naszych serwisów i streamingu video
Więc postanowiliśmy jej użyć również do zbierania danych
Potrzebna była jedynie niewielka modyfikacja która umożliwiłaby przyjmowanie tych danych w sposób asynchroniczny
Chcieliśmy zapewnić że wysyłane dane będą natychmiast obsłużone aby uniknąć wiszących połączeń oczekujących na przetworzenie zdarzenia
To nie bank, nie potrzebujemy transakcyjności
Natychmiast odpowiadamy OK i buforujemy zdarzenie lokalnie na węźle
Na każdym węźle na brzegu bufor RabbitMQ
Erlang
Bardzo lekki i szybki – 2000 msg / sec / core
Dodatkowo zapewnia nam buforowanie zdarzeń w przypadku gdyby nie dało się ich przez moment wysłać dalej
Przy okazji skorzystaliśmy jeszcze z tego że Accelerator podaje serwisy
A więc można go użyć w celu zapewnienia automatycznego umieszczenia skryptu pomiarowego na stronach
Dzięki temu nasi developerzy nie muszą o tym za każdym razem pamiętać
Takie podejście znacznie zmniejsza koszty utrzymania
A dodatkowo zyskujemy ogromne możliwości paramteryzując osadzany skrypt np. identyfikatorem DC
Jesteśmy w stanie skojarzyć działanie usługi u użytkownika z elementem infrastruktury który go obsługiwał
Ale wróćmy do naszych danych
Na tym etapie scaliliśmy miliony pojedynczych zdarzeń w wiele strumieni dostępnych na serwerach brzegowych
aby móc wyciągnąć z tych danych jakieś wnioski potrzebujemy większej mocy obliczeniowej
zazwyczaj w takim przypadku zbiera się dane przez jakiś czas
a następnie wrzuca do Hadoopa i uruchamia przetwarzanie
tyle że my potrzebujemy mieć dane online!
bo przecież cały czas – pamiętajcie - mówimy o monitoringu
w tym celu możemy wykorzystać naszą chmurę i uruchomić w niej przetwarzanie co da nam skalowalność i elastyczność
Potrzebujemy usługi która będzie:
Dobrze się skalować
Nie obsłużymy gigabitów ruchu na jednej maszynie
Nawet 10Gbps sieć może nam nie wystarczyć
Odporna na awarie
bo w tym momencie mamy cały strumień danych w jednym miejscu
Nie możemy sobie pozwolić na SPOF
Kafka – spełnia te kryteria
Plik z logiem - piszemy na końcu, czytamy z początku
Z wielu miejsc można równocześnie zapisywać
w naszym przypadku źródłem danych są węzły Acceleratora
Rozproszony plik z logiem - piszemy na końcu, czytamy z początku
Z wielu miejsc można równocześnie zapisywać
w naszym przypadku źródłem danych są węzły Acceleratora
zapewnia nam replikację danych pomiędzy serwerowniami
Dane bardzo cenne
mechanizm mirror maker
jesteśmy odporni nie tylko na utratę jednej maszyny ale również całego DC bez utraty danych
dostarcza dane do dalszego przetwarzania
online dla potrzeb monitoringu
na HDFS do późniejszego generowania raportów i batchowego przetwarzania
Łączymy dwa światy - online i offline
Kafka się do tego bardzo dobrze nadaje, gdyż w odróżnieniu klasycznych message brokerów zgodnych z AMQP pozwala wielu konsumentom niezależnie czytać dane w swoim własnym tempie - w końcu to cały czas po prostu plik z logiem, a konsument to po prostu proces przesuwający się po nim
to nie sztuka mieć dużo danych, sztuką jest umieć coś z nich policzyć
a więc potrzebujemy sposobu na przerobienie danych na konkretne metryki
Może by zapisać te wszystkie dane do relacyjnej bazy danych? I później ją odpytywać o agregaty.
A znacie relacyjną bazę danych która wytrzyma 5 mln zapisów na minutę?
I jeszcze szybko będzie odpowiadać na zapytania agregujące
A przy okazji nie zajmie nam połowy serwerowni
Storm - stworzony przez Twittera framework do strumieniowego przetwarzania danych
Pozwala developerowi skupić się na stworzeniu logiki przetwarzania zamiast interesować się jak ją uruchomić na wielu maszynach w naszej chmurze
Tworzymy topologie czyli graf w którym każdy z węzłów wyliczaja częściowe metryki lub scalających wyniki poprzednich węzłów
Każdy wierzchołek w grafie to niezależny komponent, a w praktyce po prostu kawałek kodu implementującego zadaną logikę
Na koniec nie mamy już strumienia zdarzeń a zestaw metryk niosących konkretne informacje
Przykładowo:
mediana czasów renderowania strony
liczba błędów javascript w rozbiciu na przeglądarki
czas pobierania stron w zależności od operatora telekomunikacyjnego lub kraju
i tak moglibyśmy dalej wymieniać... obecnie nasze topologie generują kilkaset tysięcy metryk jednocześnie
Co dalej?
Na koniec potrzebujemy jeszcze narzędzia do wizualizacji
Wyniki trafiają do graphite
Narzędzie pozwalające przechowywać metryki wraz z ich historycznymi wartościami
oraz rysować je w postaci wykresów
dodatkowo graphite pozwala wykonywać skomplikowane operacje analityczne na metrykach i zestawiać je sobą
Skyline (?)
co ważne klaster graphite na którym pracujemy służy również do przechowywania metryk pochodzących z innych źródeł niż RUM
dzięki temu możemy zestawiać ze sobą pomiary dokonywane u użytkownika z innymi metrykami - np. wspomnianą wcześniej temperaturą dysków w macierzy
W efekcie z kilkuset tysięcy metryk robi się kilka milionów
Z tym już standardowa instalacja graphite nie jest w stanie sobie poradzić
jako storage używamy cassandry która zapewnia nam oczekiwaną wydajność oraz replikację pomiędzy DC
zamiast frontu używamy Grafany - rysuje o wiele ładniejsze wykresy i daje więcej możliwości wizualizacji
dodatkowo rozszerzyliśmy możlwiości Grafany o tworzenie alarmów
developer ustala własne progi na podstawie metryk
po ich przekroczeniu świeci alarm
W zasadzie tutaj moglibyśmy już zakończyć gdyż mamy kompleksowe rozwiązanie służące do monitoringu
Ale może jesteśmy w stanie z tych danych wyciągnąć coś więcej?
Nie każdy przekrój jesteśmy w stanie przewidzieć
na bieżąco pojawiają się nowe pytania
Zmodyfikujmy topologię?
Zanim zmodyfikujemy pytanie przestaje już być aktualne
Nie możemy z każdym nowym pytaniem wdrażać topologii
Możliwości jest zbyt wiele żeby dla każdej rysować metrykę
Potrzebujemy usługi która będzie szybko odpowiadać na niestandardowe pytania
potrzebujemy OLAP
Problem ze standardowym OLAP na relacyjnej bazie danych byłby dokładnie identyczny jak w przypadku metryk
Druid.io
Ostatnio zaczęliśmy eksperymentować z Druidem
Przyjmuje stream z Kafki
Biężące zdarzenia zapisywane są w tzw. real-time nodeach
trzymają dane w RAM
odpowiadają bardzo szybko
starsze dane przenoszone są na bieżąco do historical node'ów oraz archiwizowane na HDFS
w momencie gdy klient zadaje zapytanie druid sprawdza gdzie ma dane
w razie potrzeby dociąga je z HDFS
Scala i odpowiada
Co istotne Druid wystawia łatwe w użyciu API
Dzięki czemu możemy tworzyć kolejne usługi i narzędzia sterowane danymi
Przykładowo w naszym CDN chcielibyśmy na podstawie danych real user wykrywać serwery które powodują wydłużone czasy i odpowiednio balansować ruch.
Podsumujmy sobie co zbudowaliśmy
Zbieramy dane w przeglądarkach użytkowników
Zbieramy je na naszej warstwie brzegowej
Za pomocą Kafki dostarczamy i scalamy w naszych DC do przetwarzania
Następnie używamy Storma do agregacji i przygotowania metryk
Które prezentujemy w Grafanie
Oprócz tego używamy Druida do odpowiadania na niestandardowe pytania
Na koniec zadajmy sobie jedno pytanie – jak RUM zmienił naszą codzienną pracę?
Dzisiaj każde wdrożenie zmiany w serwisie odbywa się dla grupy / procenta użytkowników
Dzięki temu developer jest w stanie sprawdzić swoją zmianę na stosunkowo małej próbce
Dzięki osadzeniu skryptów monitorujących na każdej stronie naszych serwisów otrzymuje
Natychmiastowy feedback jak jego wdrożenie wpłynęło na działanie u użytkownika
Jak zmieniły się czasy
Czy pojawiły się jakieś błędy
Co jeszcze ważniejsze udostępnienie wszystkich tych narzędzi spowodowało zmianę sposobu myślenia w organizacji
Nasi developerzy oraz biznes faktycznie zaczęli podejmować decyzje na podstawie danych zbieranych u użytkownika
Co daje bardzo wymierne efekty…
- W efekcie nasz największy portal informacyjny onet.pl po raz kolejny jest najszyciej działającym portalem w polskim internecie
- Ten spadek na końcu to naprawdę nie jest przypadek
Witajcie, nazywam się Jarosław Bloch, nazywam się Albert Łącki, bardzo nam miło że możemy z wami dzisiaj tu być. Podczas tej prezentacji opowiemy o tym jak w DreamLabie robimy Real User Monitoring dla milionów użytkowników [FIXME]
Zacznijmy od tego czym się na co dzień zajmujemy i z jakimi wyzwaniami się mierzymy.