Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data, especially in the case of understanding online user activity — a use case that is normally difficult without the math functions now available with Flux.
How to Enable Industrial Decarbonization with Node-RED and InfluxDBInfluxData
Graphite Energy’s thermal energy storage (TES) platform encourages clients to offset their traditional energy consumption with low-cost renewable energy sources. Their customers include manufacturers, mines, steelmakers and aluminum plants. IIoT data is collected about energy usage, fuel consumption, temperatures, solar panels, wind farms, process steam and air dryers. Discover how Graphite Energy uses InfluxDB to monitor their zero-emission energy solution.
In this webinar, Byron Ross will dive into:
Graphite Energy’s approach to reducing their clients’ carbon footprint
Their methodology to collecting sensor data used to make their operations more green
Why they chose a time series database over a data historian
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...InfluxData
In this presentation, Martti Kontula discusses EnerKey’s strategy for reducing energy consumption, how using a time series database enhances EnerKey’s competitive advantage, and their approach to using machine learning to help their customers forecast and optimize operations.
How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...InfluxData
In this webinar, learn how a long-time Industrial IT Consultant helps his customer make the leap into providing visibility of their processes to everyone in the plant. This journey led to the discovery of untapped opportunity to improve operations, reduce energy consumption, and minimize plant downtime. The collection of data from the individual sensors has led to powerful Grafana dashboards shared across the organization.
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData
In this InfluxDays NYC 2019 talk by Gunnar Aasen (Manager of Partner Engineering at InfluxData), you will get an overview of the AWS Container Monitoring Stack as well as how you can use InfluxDB on AWS for container monitoring. This session will include a demo of the solution.
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...InfluxData
Discover how Sysbee helps organizations bring DevOps culture to small and medium enterprises. Their team helps their customers by improving stability, security, scalability — by providing cost-effective IT infrastructure. Learn how monitoring everything can improve your processes and simplify debugging!
Sysbee’s introspection on monitoring tools over the years
How TSDB’s, and specifically InfluxDB, fits into improving observability
Their approach to using the TICK Stack to improve the web hosting industry
Monitoring and Troubleshooting a Real Time PipelineApache Apex
Alan Ngai, CTO/Co-Founder, OpsClarity
OpsClarity is a performance monitoring solution for stream processing applications. In additional to providing deep component monitoring it leverages data science to proactively identify anomalies across the entire data pipeline and correlates issues across the data and app tier to identify common concerns that impact business. OpsClarity automatically discovers the entire app and data topology and is years ahead of anything else in how it leverages the rich meta-data and network dependency context captured through the topology to provide rich analysis and fastest correlated troubleshooting. This talk will additionally cover integration with Apache Apex.
Best Practices for Scaling an InfluxEnterprise ClusterInfluxData
Dennis Brazil is the Sr. Manager, SRE Monitoring Ingest/Collectors & Alerting Platforms at PayPal. He has over 30 years of experience in building high performing professional teams with disciplines in Windows, Linux, Unix, MySQL, VMWare, F5 Big-IP & Citrix Netscaler Load Balancers. With these teams, they have been able to build monitoring solutions that allow them to improve Paypal’s operational efficiencies while mitigating incidents involving multiple teams.
How to Enable Industrial Decarbonization with Node-RED and InfluxDBInfluxData
Graphite Energy’s thermal energy storage (TES) platform encourages clients to offset their traditional energy consumption with low-cost renewable energy sources. Their customers include manufacturers, mines, steelmakers and aluminum plants. IIoT data is collected about energy usage, fuel consumption, temperatures, solar panels, wind farms, process steam and air dryers. Discover how Graphite Energy uses InfluxDB to monitor their zero-emission energy solution.
In this webinar, Byron Ross will dive into:
Graphite Energy’s approach to reducing their clients’ carbon footprint
Their methodology to collecting sensor data used to make their operations more green
Why they chose a time series database over a data historian
How EnerKey Using InfluxDB Saves Customers Millions by Detecting Energy Usage...InfluxData
In this presentation, Martti Kontula discusses EnerKey’s strategy for reducing energy consumption, how using a time series database enhances EnerKey’s competitive advantage, and their approach to using machine learning to help their customers forecast and optimize operations.
How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...InfluxData
In this webinar, learn how a long-time Industrial IT Consultant helps his customer make the leap into providing visibility of their processes to everyone in the plant. This journey led to the discovery of untapped opportunity to improve operations, reduce energy consumption, and minimize plant downtime. The collection of data from the individual sensors has led to powerful Grafana dashboards shared across the organization.
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData
In this InfluxDays NYC 2019 talk by Gunnar Aasen (Manager of Partner Engineering at InfluxData), you will get an overview of the AWS Container Monitoring Stack as well as how you can use InfluxDB on AWS for container monitoring. This session will include a demo of the solution.
How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using ...InfluxData
Discover how Sysbee helps organizations bring DevOps culture to small and medium enterprises. Their team helps their customers by improving stability, security, scalability — by providing cost-effective IT infrastructure. Learn how monitoring everything can improve your processes and simplify debugging!
Sysbee’s introspection on monitoring tools over the years
How TSDB’s, and specifically InfluxDB, fits into improving observability
Their approach to using the TICK Stack to improve the web hosting industry
Monitoring and Troubleshooting a Real Time PipelineApache Apex
Alan Ngai, CTO/Co-Founder, OpsClarity
OpsClarity is a performance monitoring solution for stream processing applications. In additional to providing deep component monitoring it leverages data science to proactively identify anomalies across the entire data pipeline and correlates issues across the data and app tier to identify common concerns that impact business. OpsClarity automatically discovers the entire app and data topology and is years ahead of anything else in how it leverages the rich meta-data and network dependency context captured through the topology to provide rich analysis and fastest correlated troubleshooting. This talk will additionally cover integration with Apache Apex.
Best Practices for Scaling an InfluxEnterprise ClusterInfluxData
Dennis Brazil is the Sr. Manager, SRE Monitoring Ingest/Collectors & Alerting Platforms at PayPal. He has over 30 years of experience in building high performing professional teams with disciplines in Windows, Linux, Unix, MySQL, VMWare, F5 Big-IP & Citrix Netscaler Load Balancers. With these teams, they have been able to build monitoring solutions that allow them to improve Paypal’s operational efficiencies while mitigating incidents involving multiple teams.
Gain Deep Visibility into APIs and Integrations with Anypoint MonitoringInfluxData
On average, a business supporting digital transactions now crosses 35 backend systems—and legacy tools haven’t been able to keep up. This session will cover how MuleSoft uses InfluxCloud to help power their monitoring and diagnostic solutions as well as provide end-to-end actionable visibility to APIs and integrations to help customers identify and resolve issues quickly.
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...Flink Forward
Many use cases in the telecommunication industry require producing counters, quality metrics, and alarms in a streaming fashion with very low latency. Most of this metrics are only valuable when they’re made available as soon as the associated events happened. In our company we are looking for a system able to produce this kind of real-time indicator, which must handle massive amounts of data (400,000 eps) with often peak loads (like New Year’s Eve) or out-of-order events like massive network disorder. Low latency and flexible window management with specific watermark emission are also a must-haves. Heterogeneous format, multiple flow correlation, and the possibility of late data arrival are other challenges. Flink being already widely used at Bouygues Telecom for real-time data integration, its features made it the evident candidate for the future System. In this talk, we'll present a real use case of streaming analytics using Flink, Kafka & HBase along with other legacy systems.
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
The Ohio Department of Transportation has adopted Confluent as the event driven enabler of DriveOhio, a modern Intelligent Transportation System. DriveOhio digitally links sensors, cameras, speed monitoring equipment, and smart highway assets in real time, to dynamically adjust the surface road network to maximize the safety and efficiency for travelers. Over the past 24 months the team has increased the number and types of devices within the DriveOhio environment, while also working to see their vendors adopt Kafka to better participate in data sharing.
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward
In modern applications of streaming frameworks, stateful streaming is arguably one of the most important usage cases. Flink, as a well-supported streaming framework for stateful streaming, readily helps developers spend less efforts on system deployment and focus more on the business logic. Nevertheless, upgrading from an existing production system to a new one with stateful streaming can still be a challenging task for any development team. In this talk, we will share our experience in migrating an existing system at Appier (an AI-based startup specialized with B2B solutions) to stateful streaming with Flink. We will first discuss how stateful streaming matches our business logic and its potential benefits. Then, we review the obstacles that we have encountered during migration, and present our solutions to conquer them. We hope that our experience and tips shared in this talk hints future users to prepare themselves towards applying Flink in their production systems more painlessly.
From a Time-Series Database to a Key Operational Technology for the EnterpriseInfluxData
It is important for a time-series database to provide efficient storage, aggregation, and query of time-series data. Elevating a time-series database to a core operational technology of the enterprise, however, involves so much more. In this talk, I will explore important considerations that take a time-series database from being just another data-store, to a critical infrastructure for operational intelligence. I will explore challenges related to time, change management, incorporating asset models from the business domain, and integration with other operational technologies, like streaming-data platforms.
Learn more about InfluxData’s time series platform. InfluxDB Cloud is a fast, elastic, serverless real-time monitoring platform, dashboarding engine, analytics service and event and metrics processor. It is available on AWS, Azure and Google Cloud. Since its launch, we have been busy making updates to the product!
Join Balaji Palani, Director of Product Management, as he demonstrates the the latest features of InfluxDB Cloud. This one-hour webinar will feature a product update and Q&A time.
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/siem-modernization-build-a-situationally-aware-organization-with-apache-kafka
Of all security breaches, 85% are conducted with compromised credentials, often at the administration level or higher. A lot of IT groups think “security” means authentication, authorization and encryption (AAE), but these are often tick-boxes that rarely stop breaches. The internal threat surfaces of data streams or disk drives in a raidset in a data centre are not the threat surface of interest.
Cyber or Threat organizations must conduct internal investigations of IT, subcontractors and supply chains without implicating the innocent. Therefore, they are organizationally air-gapped from IT. Some surveys indicate up to 10% of IT is under investigation at any given time.
Deploying a signal processing platform, such as Confluent Platform, allows organizations to evaluate data as soon as it becomes available enabling them to assess and mitigate risk before it arises. In Cyber or Threat Intelligence, events can be considered signals, and when analysts are hunting for threat actors, these don't appear as a single needle in a haystack, but as a series of needles. In this paradigm, streams of signals aggregate into signatures. This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest.
In this talk you will learn:
-The current threat landscape
-The difference between Security and Threat Intelligence
-The value of Confluent platform as an ideal complement to hardware endpoint detection systems and batch-based SIEM warehouses
When T-mobile launched a mobile app that delivers personalized experiences and 1:1 customer care messaging, they turned to the Elastic Stack for help monitor speed, effectiveness, and user behavior.
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...HostedbyConfluent
We built Apache Pinot - a real-time distributed OLAP datastore - for low-latency analytics at scale. This is heavily used at companies such as LinkedIn, Uber, Slack, where Kafka serves as the backbone for capturing vast amounts of data. Pinot ingests millions of events per sec from Kafka, builds indexes in real-time and serves 100K+ queries per second while ensuring latency SLA of millisecond to sub second.
In the first implementation, we used the Consumer Group feature to manage the offsets and checkpoints across multiple Kafka Consumers. However, to achieve fault tolerance and scalability, we had to run multiple consumer groups for the same topic. This was our initial strategy to maintain the SLA at high query workload. But this model posed other challenges - since Kafka maintains offset per consumer group, achieving data consistency across multiple consumer groups was not possible. Also, a failure of a single node in a consumer group meant the entire consumer group was unavailable for query processing. Restarting the failed node needed lot of manual operations to ensure data is consumed exactly once. This resulted in management overhead and inefficient hardware utilization.
While taking inspiration from the Kafka consumer group implementation, we redesigned the real-time consumption in Pinot to maintain consistent offset across multiple consumer groups. This allowed us to guarantee consistent data across all replicas. This enabled us to copy data from another consumer group during node addition, node failure or increasing the replication group.
In this talk, we will deep dive into the various challenges faced and considerations that went into this design, and learn what makes Pinot resilient to failures both in Kafka Brokers and Pinot Components. We will introduce the new concept of ""lockstep"" sequencing where multiple consumer groups can synchronize checkpoints periodically and maintain consistency. We'll describe how we achieve this while maintaining strict freshness SLAs, and also withstanding high throughput and ingestion.
As many industries, banking is undergoing a fundamental change because of the software revolution. No longer are banks competing only on interest rates and having the best traders, these days customer experience and having the best engineers are the focus. In this changing world, banks compete with new start-ups, the so-called Fintechs, and with large platform organisations such as Google, Facebook and Apple. At ING, we believe that staying ahead of the game means changing how we interact with our customers, no longer a traditional model of waiting for the customers to come to the bank through our website or apps, but to actively reach out to the customer with information that is relevant to him or her in order to make their financial life frictionless. Many of these changes are driven by reacting to all events that are relevant to the customer, and using streaming analytics to be able to reach out to the customer in milliseconds after the event occurs. Apache Flink is key for ING to achieve this. This presentation addresses how ING approaches the challenge, the role that Apache Flink plays, and the consequences regulations have on how we work with Open Source in general, and with Apache Flink (and data Artisans) in particular. This keynote takes place at Kino 3.
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...InfluxData
In IoT, understanding the health of thousands of devices is critical for deployment at scale especially when troubleshooting an issue. Particle’s customer base needed visibility into their devices with actionable data to reference in real time. Join Cullen Murphy, Site Reliability Engineer at Particle, to learn how the team built a metrics system on Telegraf, Kubernetes, and Prometheus to deploy a customer-facing product that provides critical and relevant data to their IoT product creators.
http://flink-forward.org/kb_sessions/flink-and-beam-current-state-roadmap/
It is no secret that the Dataflow model, which evolved from Google’s MapReduce, Flume, and MillWheel, has been a major influence to Apache Flink’s streaming API. The essentials of this model are captured in Apache Beam. Beam provides the Dataflow API with the option to deploy to various backends (e.g. Flink, Spark). In this talk we will examine the current state of the Flink Runner. Beam’s Runners manage the translation of the Beam API into the backend API. The Beam project itself has made an effort to summarize the capabilities of each Runner to provide an overview of the supported API concepts. From all open sources backends, Flink is currently the Runner which supports the most features. We will look at the supported Beam features and their counterpart in Flink. Further, we will look at potential improvements and upcoming features of the Flink Runner.
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
http://flink-forward.org/kb_sessions/flink-in-zalandos-world-of-microservices/
In this talk we present Zalando’s microservices architecture, introduce Saiki – our next generation data integration and distribution platform on AWS and show how we employ stream processing with Apache Flink for near-real time business intelligence.
Zalando is one of the largest online fashion retailers in Europe. In order to secure our future growth and remain competitive in this dynamic market, we are transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization.
We first have a look at how business intelligence processes have been working inside Zalando for the last years and present our current approach – Saiki. It is a scalable, cloud-based data integration and distribution infrastructure that makes data from our many microservices readily available for analytical teams.
We no longer live in a world of static data sets, but are instead confronted with endless streams of events that constantly inform us about relevant happenings from all over the enterprise. The processing of these event streams enables us to do near-real time business intelligence. In this context we have evaluated Apache Flink vs. Apache Spark in order to choose the right stream processing framework. Given our requirements, we decided to use Flink as part of our technology stack, alongside with Kafka and Elasticsearch.
With these technologies we are currently working on two use cases: a near real-time business process monitoring solution and streaming ETL.
Monitoring our business processes enables us to check if technically the Zalando platform works. It also helps us analyze data streams on the fly, e.g. order velocities, delivery velocities and to control service level agreements.
On the other hand, streaming ETL is used to relinquish resources from our relational data warehouse, as it struggles with increasingly high loads. In addition to that, it also reduces the latency and facilitates the platform scalability.
Finally, we have an outlook on our future use cases, e.g. near-real time sales and price monitoring. Another aspect to be addressed is to lower the entry barrier of stream processing for our colleagues coming from a relational database background.
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...Hakka Labs
By Alan Gardner (Platform Engineer, Rocana)
Rocana Ops is designed to handle terabytes a day of application logs and system metrics from across multiple data centres. We use Apache Kafka as a durable, high-throughput message bus at the centre of our application architecture. This talk will discuss the design and features of Kafka, its operational characteristics, and why we chose it as the backbone of our data pipeline.
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...InfluxData
Red Hat is the provider of enterprise open source solutions. Its portfolio of products includes hybrid cloud infrastructure, middleware, cloud-native apps and automation solutions. Its internal network supports all lines of business — including 60+ sites. Discover how Red Hat uses InfluxDB and Flux for better real-time monitoring of their networks to improve performance and to understand utilization better.
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
COOL WAYS TO GET STARTED
Join us for a live InfluxDB training to learn how to easily ingest at scale in a matter of seconds to help you build powerful time series based applications. Join our 45-minute demos with experts who will showcase key InfluxDB features and answer questions live from the audience.
After attending this training, attendees will be able to:
Use sample data sets to try out various visualization options
Utilize the available data ingestion methods to construct a data pipeline to InfluxDB
Leverage Notebooks to collaborate with team members
Gain best practices for InfluxDB, Telegraf and Flux
Using Time Series for Full Observability of a SaaS PlatformDevOps.com
Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data.
Gain Deep Visibility into APIs and Integrations with Anypoint MonitoringInfluxData
On average, a business supporting digital transactions now crosses 35 backend systems—and legacy tools haven’t been able to keep up. This session will cover how MuleSoft uses InfluxCloud to help power their monitoring and diagnostic solutions as well as provide end-to-end actionable visibility to APIs and integrations to help customers identify and resolve issues quickly.
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...Flink Forward
Many use cases in the telecommunication industry require producing counters, quality metrics, and alarms in a streaming fashion with very low latency. Most of this metrics are only valuable when they’re made available as soon as the associated events happened. In our company we are looking for a system able to produce this kind of real-time indicator, which must handle massive amounts of data (400,000 eps) with often peak loads (like New Year’s Eve) or out-of-order events like massive network disorder. Low latency and flexible window management with specific watermark emission are also a must-haves. Heterogeneous format, multiple flow correlation, and the possibility of late data arrival are other challenges. Flink being already widely used at Bouygues Telecom for real-time data integration, its features made it the evident candidate for the future System. In this talk, we'll present a real use case of streaming analytics using Flink, Kafka & HBase along with other legacy systems.
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
The Ohio Department of Transportation has adopted Confluent as the event driven enabler of DriveOhio, a modern Intelligent Transportation System. DriveOhio digitally links sensors, cameras, speed monitoring equipment, and smart highway assets in real time, to dynamically adjust the surface road network to maximize the safety and efficiency for travelers. Over the past 24 months the team has increased the number and types of devices within the DriveOhio environment, while also working to see their vendors adopt Kafka to better participate in data sharing.
Flink Forward Berlin 2018: Wei-Che (Tony) Wei - "Lessons learned from Migrati...Flink Forward
In modern applications of streaming frameworks, stateful streaming is arguably one of the most important usage cases. Flink, as a well-supported streaming framework for stateful streaming, readily helps developers spend less efforts on system deployment and focus more on the business logic. Nevertheless, upgrading from an existing production system to a new one with stateful streaming can still be a challenging task for any development team. In this talk, we will share our experience in migrating an existing system at Appier (an AI-based startup specialized with B2B solutions) to stateful streaming with Flink. We will first discuss how stateful streaming matches our business logic and its potential benefits. Then, we review the obstacles that we have encountered during migration, and present our solutions to conquer them. We hope that our experience and tips shared in this talk hints future users to prepare themselves towards applying Flink in their production systems more painlessly.
From a Time-Series Database to a Key Operational Technology for the EnterpriseInfluxData
It is important for a time-series database to provide efficient storage, aggregation, and query of time-series data. Elevating a time-series database to a core operational technology of the enterprise, however, involves so much more. In this talk, I will explore important considerations that take a time-series database from being just another data-store, to a critical infrastructure for operational intelligence. I will explore challenges related to time, change management, incorporating asset models from the business domain, and integration with other operational technologies, like streaming-data platforms.
Learn more about InfluxData’s time series platform. InfluxDB Cloud is a fast, elastic, serverless real-time monitoring platform, dashboarding engine, analytics service and event and metrics processor. It is available on AWS, Azure and Google Cloud. Since its launch, we have been busy making updates to the product!
Join Balaji Palani, Director of Product Management, as he demonstrates the the latest features of InfluxDB Cloud. This one-hour webinar will feature a product update and Q&A time.
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/siem-modernization-build-a-situationally-aware-organization-with-apache-kafka
Of all security breaches, 85% are conducted with compromised credentials, often at the administration level or higher. A lot of IT groups think “security” means authentication, authorization and encryption (AAE), but these are often tick-boxes that rarely stop breaches. The internal threat surfaces of data streams or disk drives in a raidset in a data centre are not the threat surface of interest.
Cyber or Threat organizations must conduct internal investigations of IT, subcontractors and supply chains without implicating the innocent. Therefore, they are organizationally air-gapped from IT. Some surveys indicate up to 10% of IT is under investigation at any given time.
Deploying a signal processing platform, such as Confluent Platform, allows organizations to evaluate data as soon as it becomes available enabling them to assess and mitigate risk before it arises. In Cyber or Threat Intelligence, events can be considered signals, and when analysts are hunting for threat actors, these don't appear as a single needle in a haystack, but as a series of needles. In this paradigm, streams of signals aggregate into signatures. This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest.
In this talk you will learn:
-The current threat landscape
-The difference between Security and Threat Intelligence
-The value of Confluent platform as an ideal complement to hardware endpoint detection systems and batch-based SIEM warehouses
When T-mobile launched a mobile app that delivers personalized experiences and 1:1 customer care messaging, they turned to the Elastic Stack for help monitor speed, effectiveness, and user behavior.
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...HostedbyConfluent
We built Apache Pinot - a real-time distributed OLAP datastore - for low-latency analytics at scale. This is heavily used at companies such as LinkedIn, Uber, Slack, where Kafka serves as the backbone for capturing vast amounts of data. Pinot ingests millions of events per sec from Kafka, builds indexes in real-time and serves 100K+ queries per second while ensuring latency SLA of millisecond to sub second.
In the first implementation, we used the Consumer Group feature to manage the offsets and checkpoints across multiple Kafka Consumers. However, to achieve fault tolerance and scalability, we had to run multiple consumer groups for the same topic. This was our initial strategy to maintain the SLA at high query workload. But this model posed other challenges - since Kafka maintains offset per consumer group, achieving data consistency across multiple consumer groups was not possible. Also, a failure of a single node in a consumer group meant the entire consumer group was unavailable for query processing. Restarting the failed node needed lot of manual operations to ensure data is consumed exactly once. This resulted in management overhead and inefficient hardware utilization.
While taking inspiration from the Kafka consumer group implementation, we redesigned the real-time consumption in Pinot to maintain consistent offset across multiple consumer groups. This allowed us to guarantee consistent data across all replicas. This enabled us to copy data from another consumer group during node addition, node failure or increasing the replication group.
In this talk, we will deep dive into the various challenges faced and considerations that went into this design, and learn what makes Pinot resilient to failures both in Kafka Brokers and Pinot Components. We will introduce the new concept of ""lockstep"" sequencing where multiple consumer groups can synchronize checkpoints periodically and maintain consistency. We'll describe how we achieve this while maintaining strict freshness SLAs, and also withstanding high throughput and ingestion.
As many industries, banking is undergoing a fundamental change because of the software revolution. No longer are banks competing only on interest rates and having the best traders, these days customer experience and having the best engineers are the focus. In this changing world, banks compete with new start-ups, the so-called Fintechs, and with large platform organisations such as Google, Facebook and Apple. At ING, we believe that staying ahead of the game means changing how we interact with our customers, no longer a traditional model of waiting for the customers to come to the bank through our website or apps, but to actively reach out to the customer with information that is relevant to him or her in order to make their financial life frictionless. Many of these changes are driven by reacting to all events that are relevant to the customer, and using streaming analytics to be able to reach out to the customer in milliseconds after the event occurs. Apache Flink is key for ING to achieve this. This presentation addresses how ING approaches the challenge, the role that Apache Flink plays, and the consequences regulations have on how we work with Open Source in general, and with Apache Flink (and data Artisans) in particular. This keynote takes place at Kino 3.
How to Deliver a Critical and Actionable Customer-Facing Metrics Product with...InfluxData
In IoT, understanding the health of thousands of devices is critical for deployment at scale especially when troubleshooting an issue. Particle’s customer base needed visibility into their devices with actionable data to reference in real time. Join Cullen Murphy, Site Reliability Engineer at Particle, to learn how the team built a metrics system on Telegraf, Kubernetes, and Prometheus to deploy a customer-facing product that provides critical and relevant data to their IoT product creators.
http://flink-forward.org/kb_sessions/flink-and-beam-current-state-roadmap/
It is no secret that the Dataflow model, which evolved from Google’s MapReduce, Flume, and MillWheel, has been a major influence to Apache Flink’s streaming API. The essentials of this model are captured in Apache Beam. Beam provides the Dataflow API with the option to deploy to various backends (e.g. Flink, Spark). In this talk we will examine the current state of the Flink Runner. Beam’s Runners manage the translation of the Beam API into the backend API. The Beam project itself has made an effort to summarize the capabilities of each Runner to provide an overview of the supported API concepts. From all open sources backends, Flink is currently the Runner which supports the most features. We will look at the supported Beam features and their counterpart in Flink. Further, we will look at potential improvements and upcoming features of the Flink Runner.
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
http://flink-forward.org/kb_sessions/flink-in-zalandos-world-of-microservices/
In this talk we present Zalando’s microservices architecture, introduce Saiki – our next generation data integration and distribution platform on AWS and show how we employ stream processing with Apache Flink for near-real time business intelligence.
Zalando is one of the largest online fashion retailers in Europe. In order to secure our future growth and remain competitive in this dynamic market, we are transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization.
We first have a look at how business intelligence processes have been working inside Zalando for the last years and present our current approach – Saiki. It is a scalable, cloud-based data integration and distribution infrastructure that makes data from our many microservices readily available for analytical teams.
We no longer live in a world of static data sets, but are instead confronted with endless streams of events that constantly inform us about relevant happenings from all over the enterprise. The processing of these event streams enables us to do near-real time business intelligence. In this context we have evaluated Apache Flink vs. Apache Spark in order to choose the right stream processing framework. Given our requirements, we decided to use Flink as part of our technology stack, alongside with Kafka and Elasticsearch.
With these technologies we are currently working on two use cases: a near real-time business process monitoring solution and streaming ETL.
Monitoring our business processes enables us to check if technically the Zalando platform works. It also helps us analyze data streams on the fly, e.g. order velocities, delivery velocities and to control service level agreements.
On the other hand, streaming ETL is used to relinquish resources from our relational data warehouse, as it struggles with increasingly high loads. In addition to that, it also reduces the latency and facilitates the platform scalability.
Finally, we have an outlook on our future use cases, e.g. near-real time sales and price monitoring. Another aspect to be addressed is to lower the entry barrier of stream processing for our colleagues coming from a relational database background.
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...Hakka Labs
By Alan Gardner (Platform Engineer, Rocana)
Rocana Ops is designed to handle terabytes a day of application logs and system metrics from across multiple data centres. We use Apache Kafka as a durable, high-throughput message bus at the centre of our application architecture. This talk will discuss the design and features of Kafka, its operational characteristics, and why we chose it as the backbone of our data pipeline.
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...InfluxData
Red Hat is the provider of enterprise open source solutions. Its portfolio of products includes hybrid cloud infrastructure, middleware, cloud-native apps and automation solutions. Its internal network supports all lines of business — including 60+ sites. Discover how Red Hat uses InfluxDB and Flux for better real-time monitoring of their networks to improve performance and to understand utilization better.
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
COOL WAYS TO GET STARTED
Join us for a live InfluxDB training to learn how to easily ingest at scale in a matter of seconds to help you build powerful time series based applications. Join our 45-minute demos with experts who will showcase key InfluxDB features and answer questions live from the audience.
After attending this training, attendees will be able to:
Use sample data sets to try out various visualization options
Utilize the available data ingestion methods to construct a data pipeline to InfluxDB
Leverage Notebooks to collaborate with team members
Gain best practices for InfluxDB, Telegraf and Flux
Using Time Series for Full Observability of a SaaS PlatformDevOps.com
Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data.
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
Talk about approaches to an observability. Do we need millions of metrics? Anomalies vs regularities? Can Machine Learning help us? Some abilities of Flux language by InfluxData
Performance doesn’t have the same definition between system administrators, developpers and business teams. What is Performance ? High CPU usage, not scalable web site, low business transaction rate per sec, slow response time, … This presentation is about maths, code performance, load testing, web performance, best practices, … Working on performance optimizaton is a very broad topic. It’s important to really understand main concepts and to have a clean and strong methodology because it could be a very time consumming activity. Happy reading !
A Practical Guide to Selecting a Stream Processing Technology confluent
Presented by Michael Noll, Product Manager, Confluent.
Why are there so many stream processing frameworks that each define their own terminology? Are the components of each comparable? Why do you need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a full framework at all.
Processing and understanding your data to create business value is the ultimate goal of a stream data platform. In this talk we will survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka. Particularly, we will learn how Kafka Streams, the built-in stream processing engine of Apache Kafka, compares to other stream processing systems that require a separate processing infrastructure.
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing, but will include brief refresher sections.
A three hour lecture I gave at the Jyväskylä Summer School. The talk goes through important details about the use of data science in real businesses. These include data deployment, data processing, practical issues with data solutions and arising trends in data science.
See also Part 1 of the lecture: Introduction Data Science. You can find it in my profile (click the face)
Brian Schouten, Director of Technical Presales for PROSTEP INC describes the requirements, risks, strategy, and technical considerations of "do-it-yourself" PLM Migrations for ENOVIA 3D EXPERIENCE.
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
Building a reliable, scalable, secure applications could happen either following verified design patterns or the hard way - following the trial and error approach. Azure architecture patterns are a tested and accepted solutions of common challenges thus reducing the technical risk to the project by not having to employ a new and untested design. However, most of the patterns are relevant to any distributed system, whether hosted on Azure or on other cloud platforms.
It's harder than ever to predict the load your application will need to handle in advance, so how do you design your architecture so you can afford to implement as you go and be ready for whatever comes your way. It's easy to focus on optimizing each part of your application but your application architecture determines the options you have to make big leaps in scalability. In this talk we'll cover practical patterns you can build today to meet the needs of rapid development while still creating systems that can scale up and out. Specific code examples will focus on .NET but the principles apply across many technologies. Real world systems will be discussed based on our experience helping customers around the world optimize their enterprise applications.
Are we there Yet?? (The long journey of Migrating from close source to opens...Marco Tusa
Migrating from Oracle to MySQL or another Open source RDBMS like Postgres is not as straightforward as many think if not well guided. Check what it means doing with someone that has done it already.
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...Lucas Jellema
Oracle Management Cloud provides seven services that collect metrics and logging from all tiers in the stack and from clouds and on premises systems alike and provide various levels of insight in what is going on or what went on. To find performance bottlenecks, browser incompatibilities, application health issues, infrastructure problems at runtime , OMC provides dasboards, alerting, synthetic tests and log watchers. This presentation gives an overview of OMC, highlights some key features and describes how AMIS got started with APM, Log Analytics and Infrastructure Monitoring.
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch
KeyBank is using an iterative design approach to scale their end-to-end enterprise monitoring system with Kafka and Elasticsearch at its core. See how they did it and the lessons learned along the way.
Similar to Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen, Technical Architect | Playtech (20)
InfluxData is excited to announce InfluxDB Clustered, the self-managed version of InfluxDB 3.0 with unparalleled flexibility, speed, performance, and scale. The evolution of InfluxDB Enterprise, InfluxDB Clustered is delivered as a collection of Kubernetes-based containers and services, which enables you to run and operate InfluxDB 3.0 where you need it, whether that's on-premises or in a private cloud environment. With this new enterprise offering, we’re excited to provide our customers with real-time queries, low-cost object storage, unlimited cardinality, and SQL language support – all with improved data access, support, and security! The newest version of InfluxDB was built on Apache Arrow, and through the open source ecosystem and integrations, extends the value of your time-stamped data.
Join this webinar to learn more about InfluxDB Clustered, and how to manage your large mission-critical workloads in the highly available database service offering!
In this webinar, Balaji Palani and Gunnar Aasen will dive into:
Key features of the new InfluxDB Clustered solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
Apache Arrow is an open source project intended to provide a standardized columnar memory format for flat and hierarchical data. It enables more efficient analytics workloads for modern CPU and GPU hardware, which makes working with large data sets easier and cheaper.
InfluxData and Dremio are both members of the Apache Software Foundation (ASF). Dremio is a data lakehouse management service known for its scalability and capacity for direct querying across diverse data sources. InfluxDB is the purpose-built time series database, and InfluxDB 3.0 has a new columnar storage engine and uses the Arrow format for representing data and moving data to and from Parquet. Discover how InfluxDB and Dremio have advanced their solutions by relying on the Apache Arrow framework.
Join this live panel as Alex Merced and Anais Dotis-Georgiou dive into:
Advantages to utilizing the Apache Arrow ecosystem
Tips and tricks for implementing the columnar data structure
How developers can best utilize the ASF to innovate and contribute to new industry standards
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
Bevi are the creators of smart water dispensers which empower people to choose their desired beverage — flat or sparkling, their desired flavor and temperature. Since 2014, Bevi users have saved more than 350 million bottles and cans. Their "smart" water coolers have prevented the extraction of 1.4 trillion oz of oil from Earth and have saved 21.7 billion grams of CO2 from the atmosphere.
Discover how Bevi uses a time series database to enable better predictive maintenance and alerting of their entire ecosystem — including the hardware and software. They are using InfluxDB to collect sensor data in real-time remotely from their internet-connected machines about their status and activity — i.e., flavor and CO2 levels, water temp, filter status, etc. They a7re using these metrics to improve their customer experience and continuously improve their sustainability practices. Gain tips and tricks on how to best utilize InfluxDB's schema-less design.
Join this webinar as Spencer Gagnon dives into:
Bevi's approach to reducing organizations' carbon footprint — they are saving 50K+ bottles and cans annually
Their entire system architecture — including InfluxDB Cloud, Grafana, Kafka, and DigitalOcean
The importance of using time-stamped data to extend the life of their machines
Power Your Predictive Analytics with InfluxDBInfluxData
If you're using InfluxDB to store and manage your time series data, you're already off to a great start. But why stop there? In our upcoming webinar, we'll show you how to take your data analysis to the next level by building predictive analytics using a variety of tools and techniques.
We will demonstrate how to use Quix to create custom dashboards and visualizations that allow you to monitor your data in real-time. We'll also introduce you to Hugging Face, a powerful tool for building models that can predict future trends and identify anomalies. With these tools at your disposal, you'll be able to extract valuable insights from your data and make more informed decisions about the future. Don't miss out on this opportunity to improve your data analysis skills and take your business to the next level!
What you will learn:
Use InfluxDB to store and manage time series data
Utilize Quix and Hugging Face to build models, visualize trends, and identify anomalies
Extract valuable insights from your data
Improve your data analysis skills to make informed decision
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
Are you considering replacing your legacy data historian and moving your OT data to the cloud? Join this technical webinar to learn how to adopt InfluxDB and IO Base - a digital platform used to improve operational efficiencies!
Teréga Solutions are the creators of digital solutions used to improve energy efficiencies and to address decarbonization challenges. Their network includes 5,000+ km of gas pipelines within France; they aim to help France attain carbon neutrality by 2050. With these impressive goals in mind, Teréga has created IO-Base — the digital platform to improve industrial performance, and increase profitability. Creating digital twins for their clients allows them to collect data from all production sites and view it in real time, from anywhere and at any time.
Discover how Teréga uses InfluxDB, Docker, and AWS to monitor its gas and hydrogen pipeline infrastructure. They chose to replace their legacy data historian with InfluxDB — the purpose built time series database. They are collecting more than 100K different metrics at various frequencies — some are collected every 5 seconds to only every 1-2 minutes. THey have reduced overall IT spend by 50% and collect 2x the amount of data at 20x frequency! By using various industrial protocols (Modbus, OPC-UA, etc.), Teréga improved output, reduced the TCO, and is now able to create added-value services: forecast, monitoring, predictive maintenance.
Join this webinar as Thomas Delquié dives into:
Teréga's approach to modernizing fossil fuel pipelines IT systems while improving yields and safety
Their centralized methodology to collecting sensor, hardware, and network metrics
The importance of time series data and why they chose InfluxDB
Build an Edge-to-Cloud Solution with the MING StackInfluxData
FlowForge enables organizations to reliably deliver Node-RED applications in a continuous, collaborative, and secure manner. Node-RED is the popular, low-code programming solution that makes it easy to connect different services using a visual programming environment. InfluxData is the creator of InfluxDB, the purpose-built time series database run by developers at scale and in any environment in the cloud, on-premises, or at the edge.
Jump-start monitoring your industrial IoT devices and discover how to build an edge-to-cloud solution with the MING stack. The MING stack includes Mosquitto/MQTT, InfluxDB, Node-RED, and Grafana. This solution can be used to improve fleet management, enable predictive maintenance of industrial machines and power generation equipment (i.e. turbines and generators) and increase safety practices (i.e. buildings, construction sites). Join this webinar to learn best practices from industrial IoT SME's.
In this webinar, Robert Marcer and Jay Clifford dive into:
Best practices for monitoring sensor data collected by everyone — from the edge to the factory
Tips and tricks for using Node-RED and InfluxDB together
Demo — see Node-RED and InfluxDB live
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
Rust is a systems programming language designed for high performance, type safety, and concurrency. According to Stack Overflow’s annual survey in 2022, Rust is the most loved language with 87% of developers saying they want to continue using it. The same survey also reported that nearly 20% of developers aren’t currently using Rust, but want to start developing using it.
Ockam’s suite of programming libraries, command line tools, and managed cloud services enable developers to orchestrate end-to-end encryption. InfluxDB is the purpose-built time series database developed to handle time series data for IoT, monitoring, and real-time analytics. Ockam was originally developed using C, and InfluxDB was originally written using Go; both solutions have been completely rewritten in Rust. Discover why two founders decided to rewrite their developer tools using Rust, and gain insight into the strategy beforehand and the entire process.
Join this live panel as Mrinal Wadhwa and Paul Dix dive into:
Their approach to rewriting a project in Rust
How to build and train engineering teams
Tips and tricks learned along the way - pitfalls to look out for!
Join this webinar as there will be a live discussion with Q&A
InfluxData is excited to announce the general availability of InfluxDB Cloud Dedicated! It is a fully managed time series database service running on cloud infrastructure resources that are dedicated to a single tenant. With this new offering, we’re excited to provide our customers with additional security options, and more custom configuration options to best suit customers’ workload requirements. Join this webinar to learn more about InfluxDB Cloud, and the new dedicated database service offering!
In this webinar, Balaji Palani and Gary Fowler will dive into:
Key features of the new InfluxDB Cloud Dedicated solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
Many developers and DevOps engineers have become aware of using their observability data to gain greater insights into their infrastructure systems. InfluxDB is the purpose-built time series database used to collect metrics and gain observability into apps, servers, containers, and networks. Developers use InfluxDB to improve the quality and efficiency of their CI/CD pipelines. Start using InfluxDB to aggregate infrastructure and application performance monitoring metrics to enable better anomaly detection, root-cause analysis, and alerting.
This session will demonstrate how to record metrics, logs, and traces with one library — OpenTelemetry — and store them in one open source time series database — InfluxDB. Zoe will demonstrate how easy it is to set up the OpenTelemetry Operator for Kubernetes and to store and analyze your data in InfluxDB.
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
American Metal Processing Company ("AMP") is the US' largest commercial rotary heat treat facility with customers in the automotive, construction, military, and agriculture industries. They use their atmosphere-protected rotary retort furnaces to provide their clients with three primary hardening services: neutral hardening (quench and temper), carburizing, and carbonitriding.
This furnace style ensures consistent, uniform heat treatment process vs. traditional batch-or-belt-style furnaces; excels at processing high volumes of smaller parts with tight tolerances; and improves the strength and toughness of plain carbon steels. Discover why AMP’s use of Telegraf, InfluxDB, Node-RED, and Grafana allows them to gain 24/7 insights into their plant operations and metallurgical results. Learn how they use time-stamped data to gain accurate metrics about their consumables usage, furnace profiles, and machine status.
Join this webinar as Grant Pinkos dives into:
American Metal Processing's approach to heat treating in a digitized environment through connected systems
Their approach to collecting and measuring sensor data to enable predictive maintenance and improve product quality
Why they need a time series database for managing and analyzing vast amounts of time-stamped data
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
Delft University is the oldest and largest technical university in the Netherlands with 25,000+ students. Since 1999, they have had a team of students (undergraduate and graduate) designing, building, and racing cars, as part of the Formula Student worldwide competition. The competition has grown to include teams from 1K+ universities in 20+ countries. Students are responsible for all aspects of car manufacturing (research, construction, testing, developing, marketing, management, and fundraising). Delft University's team includes 90 students across disciplines.
Discover how Delft University's team uses Marple and InfluxDB to collect telemetry and sensor metrics while they develop, test, and race their electrics cars. They collect sensor data about their EV's control systems using a time series platform. During races, they are collecting IoT data about their batteries, accelerometer, gyroscope, tires, etc. The engineers are able to share important car stats during races which help the drivers tweak their driving decisions — all with the goal of winning. After races, the entire team are able to analyze data in Marple to understand what to do better next time. By using Marple + InfluxDB, their team are able to collect, share and analyze high frequency car data used to make their car faster at competitions.
Join this webinar as Robbin Baauw and Nero Vanbiervliet dive into:
Marple's approach to empowering engineers to organize, analyze, and visualize their data
Delft University's collaborative methodology to building and racing their Formula-style race car
How InfluxDB is crucial to their collaborative engineering and racing process
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
InfluxData is excited to announce the general availability of InfluxDB Cloud's new storage engine! It is a cloud-native, real-time, columnar database optimized for time series data. InfluxDB's rebuilt core was coded in Rust and sits on top of Apache Arrow and DataFusion. InfluxData's team picked Apache Parquet as the persistent format. In this webinar, Paul Dix and Balaji Palani will demonstrate key product features including the removal of cardinality limits!
They will dive into:
The next phase of the InfluxDB platform
How using Apache Arrow's ecosystem has improved InfluxDB's performance and scalability
Key features of InfluxDB Cloud's new core — including SQL native support
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
balena.io helps companies develop, deploy, update, and manage IoT devices. By using Linux containers and other cloud technologies, balena enables teams to quickly and easily build fleets of connected devices. Developers are able to use containers with the language of choice and pull IoT sensor data from 70+ different single board computers into balenaCloud. Discover how to use balena.io to automate your InfluxDB deployments at the edge!
During this one-hour session, experts from balena and InfluxData will demonstrate how to build and deploy your own air quality IoT solution. You will learn:
The fundamentals of IoT sensor deployment and management using balena.
How to use a time series platform to collect and visualize metrics from edge devices.
Tips and tricks to using balenaCloud to automate InfluxDB deployments and Telegraf configurations.
How to use InfluxDB's Edge Data Replication feature to collect sensor data and push it to InfluxDB Cloud for analysis.
No coding experience required, just a curiosity to start your own IoT adventure.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
RudderStack — the creators of the leading open source Customer Data Platform (CDP) — needed a scalable way to collect and store metrics related to customer events and processing times (down to the nanosecond). They provide their clients with data pipelines that simplify data collection from applications, websites, and SaaS platforms. RudderStack's solution enables clients to stream customer data in real time — they quickly deploy flexible data pipelines that send the data to the customer's entire stack without engineering headaches. Customers are able to stream data from any tool using their 16+ SDK's, and they are able to transform the data in-transit using JavaScript or Python. How does RudderStack use a time series platform to provide their customers with real-time analytics?
Join this webinar as Ryan McCrary dives into:
RudderStack's approach to streamlining data pipelines with their 180+ out-of-the-box integrations
Their data architecture including Kapacitor for alerting and Grafana for customized dashboards
Why using InfluxDB was crucial for them for fast data collection and providing single-sources of truths for their customers
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
Customers using ThingWorx and the Manufacturing Solutions often need to store property data longer than the Solutions default to. These customers are recommended to use InfluxDB, and this presentation will cover the key considerations for moving to InfluxDB vs the standard ThingWorx value streams. Join this session as Ward highlights ThingWorx’s solution and its easy implementation process.
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
Two new features are coming to Flux that add flexibility
and functionality to your data workflow—polymorphic
labels and dynamic types. This session walks through
these new features and shows how they work.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen, Technical Architect | Playtech
1. Using InfluxDB for Full Observability of a
SaaS Platform
Aleksandr Tavgen Playtech
2. vAbout me
More than 19 years of
professional experience
FinTech and Data Science
background
From Developer to SRE Engineer
Solved and automated some
problems in Operations on scale
3. Overall problem
• Zoo of monitoring solutions in large enterprises often distributed over
the world
• M&A transactions or distributed teams make central managing
impossible or ineffective
• For small enterprises or startups the key question is about finding the
best solution
• A lot of companies have failed this way
• A lot of anti-patterns have developed
4. Managing a
Zoo
• A lot of independent teams
• Everyone has some sort of
solution
• It is hard to get overall picture
of operations
• It is hard to orchestrate and
make changes
6. Common Anti-
patterns
It is tempting to keep everything
recorded just in case
Amount of metrics in monitoring
grows exponentially
Nobody understands such huge
bunch of metrics
Engineering complexity grows as
well
7. Uber case – 9 billion of metrics / 1000 + instances for monitoring
8. IF YOU NEED 9
BILLION OF
METRICS, YOU
ARE PROBABLY
WRONG
9. Dashboards problem
• Proliferating amount of metrics leads to unusable
dashboards
• How can one observe 9 billion metrics?
• Quite often it looks like spaghetti
• It is ok to pursue anti-pattern for approx. 1,5 years
• GitLab Dashboards are a good example
10.
11.
12.
13.
14. Actually not
• Dashboards are very useful when
you know where and when to watch
• Our brain can recognize and process
visual patterns more effectively
• But only when you know what you
are looking for and when
15. Queries
vs.
Dashboards
Querying your data requires more cognitive
effort than a quick look at dashboards
Metrics are a low resolution of your
system’s dynamics
Metrics should not replace logs
It is not necessary to have millions of them
16. What are
Incidents
• Something that has impact
on operational/business
level
• Incidents are expensive
• Incidents come with
credibility costs
17. COST OF AN
HOUR OF
DOWNTIME
2017-2018
https://www.statista.com/statistics/753938/worldwide-enterprise-server-hourly-downtime-cost/
18. • Change
• Network Failure
• Bug
• Human Factor
• Unspecified
• Hardware Failure
Causes of outage
21. What is it all about?
• Any reduction of
outage/incident timeline
results in significant positive
financial impact
• It is about credibility as well
• And your DevOps teams
feel less pain and toil on
their way
23. Metrics
• It is almost impossible to operate on
billions of metrics
• In case of normal system behavior there
will always be outliers in real production
data
• Therefore, not all outliers should be
flagged as anomalous incidents
• Etsy Kale project case
24.
25. Paradigm Shift
• The main paradigm shift comes from the fields of infrastructure and
architecture
• Cloud architectures, microservices, Kubernetes, and immutable
infrastructure have changed the way companies build and operate
systems
• Virtualization, containerization and orchestration frameworks abstract
infra level
• Moving towards abstraction from the underlying hardware and
networking means that we must focus on ensuring that our
applications work as intended in the context of our business
processes.
26. KPI monitoring
• KPI metrics are related to the core business
operations
• It could be logins, active sessions, any domain
specific operations
• Heavily seasoned
• Static thresholds can’t help here
27. Our Solution
• Narrowing down the
amount of metrics required
to defined KPI metrics
• We combined push/pull
model
• Local push
• Central pull
• And we created a ML-based
system, which learns your
metrics’ behavior
29. Overwhelming
results
• Red area – Customer Detection
• Blue area – Own Observation (toil)
• Orange line – Central Grafana Introduced
• Green line – ML based solution in prod
Customer Detection has dropped to
low percentage points
30. General view
• Finding anomalies on metrics
• Finding regularities on a higher
level
• Combining events from
organization internals
(changes/deployments)
• Stream processing architectures
31. Why do we need time-series storage?
• We have unpredicted delay on networking
• Operating worldwide is a problem
• CAP theorem
• You can receive signals from the past
• But you should look into the future too
• How long should this window be in the future?
32. Why not Kafka and all those classical
streaming?
• Frameworks like Storm, Flink - oriented on tuples not time-ordered
events
• We do not want to process everything
• A lot of events are needed on-demand
• It is ok to lose some signals in favor of performance
• And we still have signals from the past
33. Why Influx v 2.0
• Flux
• Better isolation
• Central storage for metrics, events, traces
• Same streaming paradigm
• There is no mismatch between metaquering and quering
34. Taking a higher picture
• Finding anomalies on a lower level
• Tracing
• Event logs
• Finding regularities between them
• Building a topology
• We can call it AIOps as well
35. Open Tracing
• Tracing is a higher resolution of your
system’s dynamics
• Distributed tracing can show you unknown-
unknowns
• It reduces Investigation part of Incident
Timeline
• There is a good OSS Jaeger implementation
• Influx v 2.0 – the supported backend
storage
36. Jaeger with
Influxv2.0 as a
Backend storage
• Real prod case
• Every minute approx. 8000
traces
• Performance issue with
limitation on I/O ops
connections
• Bursts of context switches
on the kernel level
37. Impact on the particular
execution flow
• Db query is quite constant
• Processing time in normal case - 1-3 ms
• After a process context switch - more than 40 ms
38. Flux
• Multi-source joining
• Same functional composition paradigm
• Easy to test hypothesis
• You can combine metrics, event logs, and traces
• Data transformation based on conditions
41. • Let’s check relations between them
• Looks more like stationary time – series
• Easier to model
• Let’s check relations between them
• Looks more like stationary time – series
• Easier to model
42. Random Walk
• Processes have a lot of random
factors
• Random Walk modelling
• X(t) = X(t-1) + Er(t)
• Er(t) = X(t) - X(t-1)
• Stationary time-series is very
easy to model
• Do not need statistical models
• Just reservoir with variance
49. •It is all about semantics
•Datacenters, sites, services
•Graph topology based on time-series data
50. Timetrix
• As a lot people involved in it from
different companies
• We decided to Open Source core
engine
• Integrations which are specific to
domain companies could be easily
added
• We plan to launch Q3/Q4 2019
• Core engine is written in Java
• Great Kudos to bonitoo.io team for
great drivers
Virtualization, containerization, and orchestration frameworks are responsible for providing computational resources and handling failures creates an abstraction layer for hardware and networking.
Moving towards abstraction from the underlying hardware and networking means that we must focus on ensuring that our applications work as intended in the context of our business processes.