What are algorithms? How can I build a machine learning model? In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. At Amazon, we created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorisation machines for recommendations, and time-series forecasting. This talk will discuss those algorithms, understand where and how they can be used, and our design choices.
ARCHITECTING INFLUXENTERPRISE FOR SUCCESSInfluxData
In this session, everyone will learn how to architect their own InfluxEnterprise clusters to be performant and resilient whether in a single data center or spread across multiple datacenters.
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...InfluxData
This document provides an overview of introducing network telemetry using streaming protocols like gNMI with Telegraf. It discusses gNMI as a streaming telemetry protocol, using Telegraf to collect metrics from network devices via gNMI and SNMP, and how to normalize and enrich the collected data through Telegraf processors before outputting to a time-series database. It also includes a demo of collecting interface counters from devices supporting gNMI and SNMP, and processing the data in Telegraf.
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...Andrii Vozniuk
Our final presentation for the Cloud Data Management course at EPFL in 2012 by Anastasia Ailamaki and Christoph Koch.
We have compared the performance of PIG and JAQL when executing translated TPC-DS queries.
Note, it is highly probable that by now the results are outdated, the presentation is more of a historical value.
Relevant papers:
1) Dean, Jeffrey, and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters.
2) Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for data processing.
3) Kevin S. Beyer, Vuk Ercegovac, Rainer Gemulla, Andrey Balmin, Mohamed Y. Eltabakh, Carl-Christian Kanne, Fatma Özcan, Eugene J. Shekita. 2011. Jaql: A Scripting Language for Large Scale Semistructured Data Analysis
A talk given by Julian Hyde at FlinkForward, Berlin, on 2016/09/12.
Streaming is necessary to handle data rates and latency, but SQL is unquestionably the lingua franca of data. Is it possible to combine SQL with streaming, and if so, what does the resulting language look like? Apache Calcite is extending SQL to include streaming, and Apache Flink is using Calcite to support both regular and streaming SQL. In this talk, Julian Hyde describes streaming SQL in detail and shows how you can use streaming SQL in your application. He also describes how Calcite’s planner optimizes queries for throughput and latency.
What are algorithms? How can I build a machine learning model? In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. At Amazon, we created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorisation machines for recommendations, and time-series forecasting. This talk will discuss those algorithms, understand where and how they can be used, and our design choices.
ARCHITECTING INFLUXENTERPRISE FOR SUCCESSInfluxData
In this session, everyone will learn how to architect their own InfluxEnterprise clusters to be performant and resilient whether in a single data center or spread across multiple datacenters.
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...InfluxData
This document provides an overview of introducing network telemetry using streaming protocols like gNMI with Telegraf. It discusses gNMI as a streaming telemetry protocol, using Telegraf to collect metrics from network devices via gNMI and SNMP, and how to normalize and enrich the collected data through Telegraf processors before outputting to a time-series database. It also includes a demo of collecting interface counters from devices supporting gNMI and SNMP, and processing the data in Telegraf.
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...Andrii Vozniuk
Our final presentation for the Cloud Data Management course at EPFL in 2012 by Anastasia Ailamaki and Christoph Koch.
We have compared the performance of PIG and JAQL when executing translated TPC-DS queries.
Note, it is highly probable that by now the results are outdated, the presentation is more of a historical value.
Relevant papers:
1) Dean, Jeffrey, and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters.
2) Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for data processing.
3) Kevin S. Beyer, Vuk Ercegovac, Rainer Gemulla, Andrey Balmin, Mohamed Y. Eltabakh, Carl-Christian Kanne, Fatma Özcan, Eugene J. Shekita. 2011. Jaql: A Scripting Language for Large Scale Semistructured Data Analysis
A talk given by Julian Hyde at FlinkForward, Berlin, on 2016/09/12.
Streaming is necessary to handle data rates and latency, but SQL is unquestionably the lingua franca of data. Is it possible to combine SQL with streaming, and if so, what does the resulting language look like? Apache Calcite is extending SQL to include streaming, and Apache Flink is using Calcite to support both regular and streaming SQL. In this talk, Julian Hyde describes streaming SQL in detail and shows how you can use streaming SQL in your application. He also describes how Calcite’s planner optimizes queries for throughput and latency.
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsDataWorks Summit
As a Hadoop developer, do you want to quickly develop your Hadoop workflows? Do you want to test your workflows in a sandboxed environment similar to production? Do you want to write unit tests for your workflows and add assertions on top of it?
In just a few years, the number of users writing Hadoop/Spark jobs at LinkedIn have grown from tens to hundreds and the number of jobs running every day has grown from hundreds to thousands. With the ever increasing number of users and jobs, it becomes crucial to reduce the development time for these jobs. It is also important to test these jobs thoroughly before they go to production.
We’ve tried to address these issues by creating a testing framework for Hadoop/Spark jobs. The testing framework enables the users to run their jobs in an environment similar to the production environment and on the data which is sampled from the original data. The testing framework consists of a test deployment system, a data generation pipeline to generate the sampled data, a data management system to help users manage and search the sampled data and an assertion engine to validate the test output.
In this talk, we will discuss the motivation behind the testing framework before deep diving into its design. We will further discuss how the testing framework is helping the Hadoop users at LinkedIn to be more productive.
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
DTW: Dynamic Time Warping is a well-known method to find patterns within a time-series. It has the possibility to find a pattern even if the data are distorted. It can be used to detect trends in sell, defect in machine signals in the industry, medicine for electro-cardiograms, DNA…
Most of the implementations are usually very slow, but a very efficient open source implementation (best paper SIGKDD 2012) is implemented in C. It can be easily ported in other language, as Java, so that it can be then easily used in Flink.
We present how we did some slight modifications so that we can use with Flink at even greater scale to return the TopK best matches on past data or streaming data.
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
SignalFx ingests, processes runs analytics against, (and ultimately stores) massive numbers of time series streaming in parallel into our service which provides an analytics-based monitoring platform for modern applications.
We've chose to build our time series database (TSDB) on Cassandra for it's read and write performance at high load. This presentation will go over our evolution of optimizations to squeeze the most performance out of the TSDB to date and some steps we'll be taking in the future.
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Nick Galbreath
This document discusses the care and feeding of large scale Graphite installations. It begins with introductions and then discusses Graphite components like carbon-cache, carbon-aggregator, carbon-relay and StatsD. It covers Graphite storage, installation, documentation, middleware, backups, monitoring and the web UI. It provides tips on tuning, debugging and visualizing metrics in Graphite.
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...Flink Forward
http://flink-forward.org/kb_sessions/apache-beam-a-unified-model-for-batch-and-streaming-data-processing/
Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business, and consumers of these datasets have detailed requirements for latency, cost, and completeness. Apache Beam (incubating) defines a new data processing programming model that evolved from more than a decade of experience within Google, including MapReduce, FlumeJava, MillWheel, and Cloud Dataflow. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, et al.) and proprietary (e.g., Google Cloud Dataflow). This talk will cover the basics of Apache Beam, touch on its evolution, describe main concepts in the programming model, and compare with similar systems. We’ll go from a simple scenario to a relatively complex data processing pipeline, and finally demonstrate execution of that pipeline on multiple runtimes.
Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick EvansSpark Summit
This document discusses using Apache Kafka, Python, and Spark Streaming for real-time risk management of credit card transactions. It outlines how Spark Streaming allows analyzing large volumes of event data in real-time to identify risky transactions that require closer review. It describes the architecture of using Kafka to stream event data to Spark Streaming for processing, and how the receiverless approach improves on processing data from offsets in Kafka. Examples show how Spark Streaming can be used to filter transactions by risk level and output the results to a case management system. The document concludes by discussing opportunities to improve the system through time-windowed aggregations, machine learning, monitoring, and hiring.
How to Build a Telegraf Plugin by Noah CrowleyInfluxData
Telegraf is a plugin-driven server agent for collecting & reporting metrics and there are many plugins already written to source data from a variety of services and systems. However, there may be instances where you need to write your own plugin to source data from your particular systems. In this InfluxDays NYC 2019 session, Noah Crowley will provide you with the steps on how to write your own Telegraf plugin. Writing your own Telegraf plugin will require an understanding of the Go programming language.
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation efficiency. Unfortunately, state-of-the-art systems for approximate computing, such as BlinkDB, ApproxHadoop, primarily target batch analytics, where the input data remains unchanged during the course of sampling. Thus, they are not well-suited for stream analytics. In this talk, we will present the design of StreamApprox, a Flink-based stream analytics system for approximate computing. StreamApprox implements an online stratified reservoir sampling algorithm in Apache Flink to produce approximate output with rigorous error bounds.
Virtual training Intro to the Tick stack and InfluxEnterpriseInfluxData
In this webinar, we will provide an introduction to the components of the TICK Stack and a review the features of InfluxEnterprise and InfluxCloud. We also demo how to install the TICK stack.
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
Flink provides fault tolerance guarantees through checkpointing and recovery mechanisms. Checkpoints take consistent snapshots of distributed state and data, while barriers mark checkpoints in the data flow. This allows Flink to recover jobs from failures and resume processing from the last completed checkpoint. Flink also implements high availability by persisting metadata like the execution graph and checkpoints to Apache Zookeeper, enabling a standby JobManager to take over if the active one fails.
Why building a big data platform is hard? What are the key aspects involved in providing a "Serverless" experience for data folks. And how Databricks solves infrastructure problems and provides the "Serverless" experience.
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...InfluxData
In this InfluxDays NYC 2019 talk, you will get an overview of the Google data pipelines and some use-cases for infrastructure monitoring and IoT (Google). In addition, we will share some common solutions that can be deployed on GCP including using InfluxDB time series database for Kubernetes Monitoring and IoT.
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData InfluxData
This talk introduces the SQL data source for Flux. It will start with examples of using data from MySQL or Postgres with time series data from InfluxDB. It will then go over the details of how the SQL data source was created.
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Dan Halperin
Apache Beam is a unified programming model for efficient and portable data processing pipelines. It provides abstractions like PCollections, sources/readers, ParDo, GroupByKey, side inputs, and windowing that hide complexity and allow runners to optimize efficiency. Beam supports both batch and streaming workloads on different distributed processing backends. It gives runners control over bundle size, splitting, and triggering to make tradeoffs between latency, throughput, and efficiency based on workload and cluster resources. This allows the same pipeline to be executed efficiently in different contexts without changes to the user code.
The increasing availability of mobile phones with embedded GPS devices and sensors has spurred the use of vehicle telematics in recent years. Telematics provides detailed and continuous information of a vehicle such as the location, speed, and movement. Vehicle telematics can be further linked with other spatial data to provide context to understand driving behaviors at the detailed level. However, the collection of high-frequency telematics data results in huge volumes of data that must be processed efficiently. And the raw sensor and GPS data must be properly pre-processed and transformed to extract signal relevant to downstream processes. In addition, driving behavior often depends on the spatial context, and the analysis of telematics must be contextualized using spatial and real-time traffic data.
Our talk covers the promises and challenges of telematics data. We present a framework for large-scaled telematics data analysis using Apache big data tools (Hadoop, Hive, Spark, Kafka, etc). We discuss common techniques to load and transform telematics data. We then present how to use machine learning on telematics data to derive insights about driving safety.
Speakers
Yanwei Zhang, Senior Data Scientists II, Uber
Neil Parker, Senior Software Engineer, Uber
Kapacitor - Real Time Data Processing EnginePrashant Vats
Kapacitor is a native data processing engine.Kapacitor is a native data processing engine.It can process both stream and batch data from InfluxDB.It lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds. Key Kapacitor Capabilities
-Alerting
-ETL (Extraction, Transformation and Loading)
-Action Oriented
-Streaming Analytics
-Anomaly Detection
Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.
Modern data systems don't just process massive amounts of data, they need to do it very fast. Using fraud detection as a convenient example, this session will include best practices on how to build real-time data processing applications using Apache Kafka. We'll explain how Kafka makes real-time processing almost trivial, discuss the pros and cons of the famous lambda architecture, help you choose a stream processing framework and even talk about deployment options.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1OKo5FN.
Danny Yuan discusses how stream processing is used in Uber's real-time system to solve a wide range of problems, including but not limited to real-time aggregation and prediction on geospatial time series, data migration, monitoring and alerting, and extracting patterns from data streams. Yuan also presents the architecture of the stream processing pipeline. Filmed at qconsf.com.
Danny Yuan is a software engineer in Uber. He's currently working on streaming systems for Uber's logistics platform. Prior to joining Uber, he worked on building Netflix's cloud platform. His work includes predictive autoscaling, distributed tracing service, real-time data pipeline that scaled to process hundreds of billions of events every day, and Netflix's low-latency crypto services.
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
The document summarizes an IOT ETL performance case study where the author collected water and electric meter data and loaded it into a database. The initial load of over 90 million documents from a 10GB file into a MongoDB database took over 4 hours. The author then redesigned the data schema, splitting it into hourly documents to improve query performance. This reduced the processing time to just 3 minutes and the data size to 13MB. The key lessons were that changing the data schema and using batch writes with multiple workers can dramatically improve ETL and query performance.
Lecture 2- Practical AD and DA Conveters (Online Learning).pptxHamzaJaved306957
This document provides information about a lecture on practical analog-to-digital (A/D) and digital-to-analog (D/A) converters. It begins with background resources and then describes the contents and learning outcomes of the course. The breakdown of topics is presented along with an outline of the lecture on practical A/D converters. Key points include the components and operation of practical ADCs, sources of non-idealities, and specifications used to characterize performance including SNR, SINAD, ENOB, and SFDR.
Beyond unit tests: Deployment and testing for Hadoop/Spark workflowsDataWorks Summit
As a Hadoop developer, do you want to quickly develop your Hadoop workflows? Do you want to test your workflows in a sandboxed environment similar to production? Do you want to write unit tests for your workflows and add assertions on top of it?
In just a few years, the number of users writing Hadoop/Spark jobs at LinkedIn have grown from tens to hundreds and the number of jobs running every day has grown from hundreds to thousands. With the ever increasing number of users and jobs, it becomes crucial to reduce the development time for these jobs. It is also important to test these jobs thoroughly before they go to production.
We’ve tried to address these issues by creating a testing framework for Hadoop/Spark jobs. The testing framework enables the users to run their jobs in an environment similar to the production environment and on the data which is sampled from the original data. The testing framework consists of a test deployment system, a data generation pipeline to generate the sampled data, a data management system to help users manage and search the sampled data and an assertion engine to validate the test output.
In this talk, we will discuss the motivation behind the testing framework before deep diving into its design. We will further discuss how the testing framework is helping the Hadoop users at LinkedIn to be more productive.
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
DTW: Dynamic Time Warping is a well-known method to find patterns within a time-series. It has the possibility to find a pattern even if the data are distorted. It can be used to detect trends in sell, defect in machine signals in the industry, medicine for electro-cardiograms, DNA…
Most of the implementations are usually very slow, but a very efficient open source implementation (best paper SIGKDD 2012) is implemented in C. It can be easily ported in other language, as Java, so that it can be then easily used in Flink.
We present how we did some slight modifications so that we can use with Flink at even greater scale to return the TopK best matches on past data or streaming data.
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
SignalFx ingests, processes runs analytics against, (and ultimately stores) massive numbers of time series streaming in parallel into our service which provides an analytics-based monitoring platform for modern applications.
We've chose to build our time series database (TSDB) on Cassandra for it's read and write performance at high load. This presentation will go over our evolution of optimizations to squeeze the most performance out of the TSDB to date and some steps we'll be taking in the future.
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Nick Galbreath
This document discusses the care and feeding of large scale Graphite installations. It begins with introductions and then discusses Graphite components like carbon-cache, carbon-aggregator, carbon-relay and StatsD. It covers Graphite storage, installation, documentation, middleware, backups, monitoring and the web UI. It provides tips on tuning, debugging and visualizing metrics in Graphite.
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...Flink Forward
http://flink-forward.org/kb_sessions/apache-beam-a-unified-model-for-batch-and-streaming-data-processing/
Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business, and consumers of these datasets have detailed requirements for latency, cost, and completeness. Apache Beam (incubating) defines a new data processing programming model that evolved from more than a decade of experience within Google, including MapReduce, FlumeJava, MillWheel, and Cloud Dataflow. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, et al.) and proprietary (e.g., Google Cloud Dataflow). This talk will cover the basics of Apache Beam, touch on its evolution, describe main concepts in the programming model, and compare with similar systems. We’ll go from a simple scenario to a relatively complex data processing pipeline, and finally demonstrate execution of that pipeline on multiple runtimes.
Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick EvansSpark Summit
This document discusses using Apache Kafka, Python, and Spark Streaming for real-time risk management of credit card transactions. It outlines how Spark Streaming allows analyzing large volumes of event data in real-time to identify risky transactions that require closer review. It describes the architecture of using Kafka to stream event data to Spark Streaming for processing, and how the receiverless approach improves on processing data from offsets in Kafka. Examples show how Spark Streaming can be used to filter transactions by risk level and output the results to a case management system. The document concludes by discussing opportunities to improve the system through time-windowed aggregations, machine learning, monitoring, and hiring.
How to Build a Telegraf Plugin by Noah CrowleyInfluxData
Telegraf is a plugin-driven server agent for collecting & reporting metrics and there are many plugins already written to source data from a variety of services and systems. However, there may be instances where you need to write your own plugin to source data from your particular systems. In this InfluxDays NYC 2019 session, Noah Crowley will provide you with the steps on how to write your own Telegraf plugin. Writing your own Telegraf plugin will require an understanding of the Go programming language.
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation efficiency. Unfortunately, state-of-the-art systems for approximate computing, such as BlinkDB, ApproxHadoop, primarily target batch analytics, where the input data remains unchanged during the course of sampling. Thus, they are not well-suited for stream analytics. In this talk, we will present the design of StreamApprox, a Flink-based stream analytics system for approximate computing. StreamApprox implements an online stratified reservoir sampling algorithm in Apache Flink to produce approximate output with rigorous error bounds.
Virtual training Intro to the Tick stack and InfluxEnterpriseInfluxData
In this webinar, we will provide an introduction to the components of the TICK Stack and a review the features of InfluxEnterprise and InfluxCloud. We also demo how to install the TICK stack.
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
Flink provides fault tolerance guarantees through checkpointing and recovery mechanisms. Checkpoints take consistent snapshots of distributed state and data, while barriers mark checkpoints in the data flow. This allows Flink to recover jobs from failures and resume processing from the last completed checkpoint. Flink also implements high availability by persisting metadata like the execution graph and checkpoints to Apache Zookeeper, enabling a standby JobManager to take over if the active one fails.
Why building a big data platform is hard? What are the key aspects involved in providing a "Serverless" experience for data folks. And how Databricks solves infrastructure problems and provides the "Serverless" experience.
Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...InfluxData
In this InfluxDays NYC 2019 talk, you will get an overview of the Google data pipelines and some use-cases for infrastructure monitoring and IoT (Google). In addition, we will share some common solutions that can be deployed on GCP including using InfluxDB time series database for Kubernetes Monitoring and IoT.
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData InfluxData
This talk introduces the SQL data source for Flux. It will start with examples of using data from MySQL or Postgres with time series data from InfluxDB. It will then go over the details of how the SQL data source was created.
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Dan Halperin
Apache Beam is a unified programming model for efficient and portable data processing pipelines. It provides abstractions like PCollections, sources/readers, ParDo, GroupByKey, side inputs, and windowing that hide complexity and allow runners to optimize efficiency. Beam supports both batch and streaming workloads on different distributed processing backends. It gives runners control over bundle size, splitting, and triggering to make tradeoffs between latency, throughput, and efficiency based on workload and cluster resources. This allows the same pipeline to be executed efficiently in different contexts without changes to the user code.
The increasing availability of mobile phones with embedded GPS devices and sensors has spurred the use of vehicle telematics in recent years. Telematics provides detailed and continuous information of a vehicle such as the location, speed, and movement. Vehicle telematics can be further linked with other spatial data to provide context to understand driving behaviors at the detailed level. However, the collection of high-frequency telematics data results in huge volumes of data that must be processed efficiently. And the raw sensor and GPS data must be properly pre-processed and transformed to extract signal relevant to downstream processes. In addition, driving behavior often depends on the spatial context, and the analysis of telematics must be contextualized using spatial and real-time traffic data.
Our talk covers the promises and challenges of telematics data. We present a framework for large-scaled telematics data analysis using Apache big data tools (Hadoop, Hive, Spark, Kafka, etc). We discuss common techniques to load and transform telematics data. We then present how to use machine learning on telematics data to derive insights about driving safety.
Speakers
Yanwei Zhang, Senior Data Scientists II, Uber
Neil Parker, Senior Software Engineer, Uber
Kapacitor - Real Time Data Processing EnginePrashant Vats
Kapacitor is a native data processing engine.Kapacitor is a native data processing engine.It can process both stream and batch data from InfluxDB.It lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds. Key Kapacitor Capabilities
-Alerting
-ETL (Extraction, Transformation and Loading)
-Action Oriented
-Streaming Analytics
-Anomaly Detection
Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.
Modern data systems don't just process massive amounts of data, they need to do it very fast. Using fraud detection as a convenient example, this session will include best practices on how to build real-time data processing applications using Apache Kafka. We'll explain how Kafka makes real-time processing almost trivial, discuss the pros and cons of the famous lambda architecture, help you choose a stream processing framework and even talk about deployment options.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1OKo5FN.
Danny Yuan discusses how stream processing is used in Uber's real-time system to solve a wide range of problems, including but not limited to real-time aggregation and prediction on geospatial time series, data migration, monitoring and alerting, and extracting patterns from data streams. Yuan also presents the architecture of the stream processing pipeline. Filmed at qconsf.com.
Danny Yuan is a software engineer in Uber. He's currently working on streaming systems for Uber's logistics platform. Prior to joining Uber, he worked on building Netflix's cloud platform. His work includes predictive autoscaling, distributed tracing service, real-time data pipeline that scaled to process hundreds of billions of events every day, and Netflix's low-latency crypto services.
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
The document summarizes an IOT ETL performance case study where the author collected water and electric meter data and loaded it into a database. The initial load of over 90 million documents from a 10GB file into a MongoDB database took over 4 hours. The author then redesigned the data schema, splitting it into hourly documents to improve query performance. This reduced the processing time to just 3 minutes and the data size to 13MB. The key lessons were that changing the data schema and using batch writes with multiple workers can dramatically improve ETL and query performance.
Lecture 2- Practical AD and DA Conveters (Online Learning).pptxHamzaJaved306957
This document provides information about a lecture on practical analog-to-digital (A/D) and digital-to-analog (D/A) converters. It begins with background resources and then describes the contents and learning outcomes of the course. The breakdown of topics is presented along with an outline of the lecture on practical A/D converters. Key points include the components and operation of practical ADCs, sources of non-idealities, and specifications used to characterize performance including SNR, SINAD, ENOB, and SFDR.
Learn How To Use The #1 DevOps Open Source Time Series DB Platform for Metrics & Events (Time Series Data).
Presentation used in Udemy training: https://www.udemy.com/course/influxdb-time-series-database/?referralCode=09D0B30F92258262D4C6
If you're looking to setup a system to store your metrics in (e.g. app/server metrics), or you need to store & manage other time series, then this course is for you! InfluxDB is currently the #1 time series database (according to db-engines). More and more companies are moving their time series data into a database that is really fit for this purpose, which makes it a really good skill to have.
InfluxDB is an open-source database optimized for fast, high-availability storage and retrieval of time series data. InfluxDB is great for operations monitoring, application metrics, and real-time analytics. InfluxDB is the Time Series Database in the TICK stack and this technology is rising and so is the need for this knowledge in the job market. Its a super useful tool to have on your toolbelt as a DevOps engineer or as a IT professional in general. In this course we will touch all important topics without the need for any prior knowledge.
Dccp evaluation for sip signaling ict4 m Agus Awaludin
This document discusses evaluating the performance of using the Datagram Congestion Control Protocol (DCCP) for SIP signaling compared to the traditional UDP. It describes developing a DCCP agent and SIP traffic for the NS-2 network simulator to simulate SIP call setup over DCCP and UDP. The simulation results show that DCCP has lower call drop rates than UDP and less variation in call setup delays, indicating DCCP may be a preferable transport for SIP signaling over UDP.
Real-time in the real world: DIRT in productionbcantrill
This document discusses the challenges of building and debugging DIRT (data-intensive real-time) applications in production. It provides examples from the mobile push-to-talk app Voxer, which is described as a canonical DIRT app. Specific issues covered include application restarts inducing latency bubbles, dropped TCP connections causing latency outliers, and identifying sources of slow disk I/O. Tools like DTrace are highlighted as being essential for instrumentation and problem diagnosis in DIRT apps.
Reconsider TCPdump for Modern TroubleshootingAvi Networks
Are you tired of troubleshooting with TCPdump? The Avi Vantage Platform is here to help. Learn how you can reconsider your decades-old CPU-intensive logging tools – and gain intuitive, real-time analytics, faster time-to-resolution, modern SSL / TLS encryption, and (most importantly) happy IT teams focused on delivering applications.
Watch this Avi webinar to learn:
- Why TCPdump should be your tool of last resort
- How headers compressed with HTTP/2, PFS, and distributed systems have rendered certain tools useless
- How you can replace TCPdump with intelligent logs and analytics
- How to future proof your troubleshooting tools with HTTP/3, TLS 1.3, containers and Kubernetes
Watch on-demand here https://www.networkworld.com/resources/form?placement_id=de4979d3-4f46-498e-8285-2bdad91ca3fb&brand_id=512
Redis For Distributed & Fault Tolerant Data Plumbing Infrastructure Redis Labs
This document summarizes Altizon's offerings including the Datonis and MInt platforms, which are used to process large volumes of IoT event data. It describes how Redis is used within Datonis for caching metadata and licensing information to speed up event ingestion, and for storing streaming data to enable real-time analytics and rule processing. The document then outlines the design of a data pipeline that aggregates IoT data from Datonis into time-based buckets and pushes it to external data stores on a scheduled basis using Redis for coordination and caching.
This document summarizes aggregation systems for big data. It discusses how aggregations turn raw data into condensed, human-understandable information through techniques like time series, top-k, cardinality, and quantiles. It also covers abstraction concepts like monoids that allow different data types to be aggregated. Finally, it outlines common components of online incremental aggregation systems and provides examples like Twitter's Summingbird and Google's Mesa that compute aggregations incrementally in scalable key-value stores.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
This document provides an overview of digital signal processors (DSPs). It defines a DSP as an integrated circuit designed for high-speed data manipulation used in applications such as audio, communications, and image processing. The document discusses how DSPs work by converting analog signals to digital signals and processing them. It explains that DSPs are needed because they can perform multiplication and division faster than general-purpose processors. The rest of the document details the architecture of DSPs, examples of DSP chip families like TMS320, and how instruction pipelining is implemented on the TMS320C54X DSP processor.
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Redis Labs
The document discusses rate limiting and metering using Redis. It begins by introducing rate limiting and metering and why Redis is well-suited for these tasks. It then covers different Redis data structures that can be used, such as lists, hashes, sorted sets and strings. Common Redis commands for counting, setting keys and checking time to live are also presented. Different rate limiting design patterns and anti-patterns are described, including fixed window, sliding window and token bucket approaches. Finally, resources for further information are provided.
Cloud Security Monitoring and Spark Analyticsamesar0
This document summarizes a presentation about Threat Stack's use of Spark analytics to process security event data from their cloud monitoring platform. Key points:
- Threat Stack uses Spark to perform rollups and aggregations on streaming security event data from their customers' cloud environments to detect threats and monitor compliance.
- The event data is consumed from RabbitMQ by an "Event Writer" process and written to S3 in batches, where it is then processed by Spark jobs running every 10 minutes.
- Spark analytics provides scalable rollups of event counts and other metrics that are written to Postgres. This replaced less scalable homegrown solutions and Elasticsearch facets.
- Ongoing work includes optimizing
This document summarizes a presentation about Siemens' solution to analyze sensor data from gas turbines for predictive maintenance. The solution loads billions of sensor readings from turbines into Hadoop and integrates it with a Teradata data warehouse. It uses a period data type and semantic views to transform the sparse sensor data into aligned time series that are easier to analyze. This allows monitoring all turbine data historically rather than a small sample, facilitating detection of abnormal trends for improved operations and maintenance.
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
The Spark Listener interface provides a fast, simple and efficient route to monitoring and observing your Spark application - and you can start using it in minutes. In this talk, we'll introduce the Spark Listener interfaces available in core and streaming applications, and show a few ways in which they've changed our world for the better at SpotX. If you're looking for a "Eureka!" moment in monitoring or tracking of your Spark apps, look no further than Spark Listeners and this talk!
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Amazon Web Services
“Infrastructure as Code” has changed not only how we think about configuring infrastructure, but about the infrastructure itself. AWS has been at the core of this movement, enabling your infrastructure teams to benefit from software engineering best practices such as CI/CD, automated testing, and repeatable deployments. Now that you have mastered the art of managing your infrastructure as code, it’s time to leverage these same lessons for monitoring and metrics. In this session, we dive into how you can leverage tooling such as AWS, Terraform, and Datadog to programmatically define your monitoring so that you that you can scale your organizational observability along with your infrastructure, and attain consistency from local development all the way through production.
Session sponsored by Datadog, Inc.
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
This talk is about methods and tools for troubleshooting Spark workloads at scale and is aimed at developers, administrators and performance practitioners. You will find examples illustrating the importance of using the right tools and right methodologies for measuring and understanding performance, in particular highlighting the importance of using data and root cause analysis to understand and improve the performance of Spark applications. The talk has a strong focus on practical examples and on tools for collecting data relevant for performance analysis. This includes tools for collecting Spark metrics and tools for collecting OS metrics. Among others, the talk will cover sparkMeasure, a tool developed by the author to collect Spark task metric and SQL metrics data, tools for analysing I/O and network workloads, tools for analysing CPU usage and memory bandwidth, tools for profiling CPU usage and for Flame Graph visualization.
InfluxData is excited to announce InfluxDB Clustered, the self-managed version of InfluxDB 3.0 with unparalleled flexibility, speed, performance, and scale. The evolution of InfluxDB Enterprise, InfluxDB Clustered is delivered as a collection of Kubernetes-based containers and services, which enables you to run and operate InfluxDB 3.0 where you need it, whether that's on-premises or in a private cloud environment. With this new enterprise offering, we’re excited to provide our customers with real-time queries, low-cost object storage, unlimited cardinality, and SQL language support – all with improved data access, support, and security! The newest version of InfluxDB was built on Apache Arrow, and through the open source ecosystem and integrations, extends the value of your time-stamped data.
Join this webinar to learn more about InfluxDB Clustered, and how to manage your large mission-critical workloads in the highly available database service offering!
In this webinar, Balaji Palani and Gunnar Aasen will dive into:
Key features of the new InfluxDB Clustered solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
Apache Arrow is an open source project intended to provide a standardized columnar memory format for flat and hierarchical data. It enables more efficient analytics workloads for modern CPU and GPU hardware, which makes working with large data sets easier and cheaper.
InfluxData and Dremio are both members of the Apache Software Foundation (ASF). Dremio is a data lakehouse management service known for its scalability and capacity for direct querying across diverse data sources. InfluxDB is the purpose-built time series database, and InfluxDB 3.0 has a new columnar storage engine and uses the Arrow format for representing data and moving data to and from Parquet. Discover how InfluxDB and Dremio have advanced their solutions by relying on the Apache Arrow framework.
Join this live panel as Alex Merced and Anais Dotis-Georgiou dive into:
Advantages to utilizing the Apache Arrow ecosystem
Tips and tricks for implementing the columnar data structure
How developers can best utilize the ASF to innovate and contribute to new industry standards
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
Bevi are the creators of smart water dispensers which empower people to choose their desired beverage — flat or sparkling, their desired flavor and temperature. Since 2014, Bevi users have saved more than 350 million bottles and cans. Their "smart" water coolers have prevented the extraction of 1.4 trillion oz of oil from Earth and have saved 21.7 billion grams of CO2 from the atmosphere.
Discover how Bevi uses a time series database to enable better predictive maintenance and alerting of their entire ecosystem — including the hardware and software. They are using InfluxDB to collect sensor data in real-time remotely from their internet-connected machines about their status and activity — i.e., flavor and CO2 levels, water temp, filter status, etc. They a7re using these metrics to improve their customer experience and continuously improve their sustainability practices. Gain tips and tricks on how to best utilize InfluxDB's schema-less design.
Join this webinar as Spencer Gagnon dives into:
Bevi's approach to reducing organizations' carbon footprint — they are saving 50K+ bottles and cans annually
Their entire system architecture — including InfluxDB Cloud, Grafana, Kafka, and DigitalOcean
The importance of using time-stamped data to extend the life of their machines
Power Your Predictive Analytics with InfluxDBInfluxData
If you're using InfluxDB to store and manage your time series data, you're already off to a great start. But why stop there? In our upcoming webinar, we'll show you how to take your data analysis to the next level by building predictive analytics using a variety of tools and techniques.
We will demonstrate how to use Quix to create custom dashboards and visualizations that allow you to monitor your data in real-time. We'll also introduce you to Hugging Face, a powerful tool for building models that can predict future trends and identify anomalies. With these tools at your disposal, you'll be able to extract valuable insights from your data and make more informed decisions about the future. Don't miss out on this opportunity to improve your data analysis skills and take your business to the next level!
What you will learn:
Use InfluxDB to store and manage time series data
Utilize Quix and Hugging Face to build models, visualize trends, and identify anomalies
Extract valuable insights from your data
Improve your data analysis skills to make informed decision
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
Are you considering replacing your legacy data historian and moving your OT data to the cloud? Join this technical webinar to learn how to adopt InfluxDB and IO Base - a digital platform used to improve operational efficiencies!
Teréga Solutions are the creators of digital solutions used to improve energy efficiencies and to address decarbonization challenges. Their network includes 5,000+ km of gas pipelines within France; they aim to help France attain carbon neutrality by 2050. With these impressive goals in mind, Teréga has created IO-Base — the digital platform to improve industrial performance, and increase profitability. Creating digital twins for their clients allows them to collect data from all production sites and view it in real time, from anywhere and at any time.
Discover how Teréga uses InfluxDB, Docker, and AWS to monitor its gas and hydrogen pipeline infrastructure. They chose to replace their legacy data historian with InfluxDB — the purpose built time series database. They are collecting more than 100K different metrics at various frequencies — some are collected every 5 seconds to only every 1-2 minutes. THey have reduced overall IT spend by 50% and collect 2x the amount of data at 20x frequency! By using various industrial protocols (Modbus, OPC-UA, etc.), Teréga improved output, reduced the TCO, and is now able to create added-value services: forecast, monitoring, predictive maintenance.
Join this webinar as Thomas Delquié dives into:
Teréga's approach to modernizing fossil fuel pipelines IT systems while improving yields and safety
Their centralized methodology to collecting sensor, hardware, and network metrics
The importance of time series data and why they chose InfluxDB
Build an Edge-to-Cloud Solution with the MING StackInfluxData
FlowForge enables organizations to reliably deliver Node-RED applications in a continuous, collaborative, and secure manner. Node-RED is the popular, low-code programming solution that makes it easy to connect different services using a visual programming environment. InfluxData is the creator of InfluxDB, the purpose-built time series database run by developers at scale and in any environment in the cloud, on-premises, or at the edge.
Jump-start monitoring your industrial IoT devices and discover how to build an edge-to-cloud solution with the MING stack. The MING stack includes Mosquitto/MQTT, InfluxDB, Node-RED, and Grafana. This solution can be used to improve fleet management, enable predictive maintenance of industrial machines and power generation equipment (i.e. turbines and generators) and increase safety practices (i.e. buildings, construction sites). Join this webinar to learn best practices from industrial IoT SME's.
In this webinar, Robert Marcer and Jay Clifford dive into:
Best practices for monitoring sensor data collected by everyone — from the edge to the factory
Tips and tricks for using Node-RED and InfluxDB together
Demo — see Node-RED and InfluxDB live
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
The document is an agenda for a discussion between the CTO and founder of Ockam, Mrinal Wadhwa, and the CTO and founder of InfluxData, Paul Dix, about rewriting products using the Rust programming language. It includes an introduction of the founders, an overview of the discussion topics like why they decided to rewrite in Rust and the challenges they faced, how they got their engineers comfortable with Rust, tips they learned in the process, benefits gained from moving to Rust, and how their communities responded to the switch.
InfluxData is excited to announce the general availability of InfluxDB Cloud Dedicated! It is a fully managed time series database service running on cloud infrastructure resources that are dedicated to a single tenant. With this new offering, we’re excited to provide our customers with additional security options, and more custom configuration options to best suit customers’ workload requirements. Join this webinar to learn more about InfluxDB Cloud, and the new dedicated database service offering!
In this webinar, Balaji Palani and Gary Fowler will dive into:
Key features of the new InfluxDB Cloud Dedicated solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
Many developers and DevOps engineers have become aware of using their observability data to gain greater insights into their infrastructure systems. InfluxDB is the purpose-built time series database used to collect metrics and gain observability into apps, servers, containers, and networks. Developers use InfluxDB to improve the quality and efficiency of their CI/CD pipelines. Start using InfluxDB to aggregate infrastructure and application performance monitoring metrics to enable better anomaly detection, root-cause analysis, and alerting.
This session will demonstrate how to record metrics, logs, and traces with one library — OpenTelemetry — and store them in one open source time series database — InfluxDB. Zoe will demonstrate how easy it is to set up the OpenTelemetry Operator for Kubernetes and to store and analyze your data in InfluxDB.
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
American Metal Processing Company ("AMP") is the US' largest commercial rotary heat treat facility with customers in the automotive, construction, military, and agriculture industries. They use their atmosphere-protected rotary retort furnaces to provide their clients with three primary hardening services: neutral hardening (quench and temper), carburizing, and carbonitriding.
This furnace style ensures consistent, uniform heat treatment process vs. traditional batch-or-belt-style furnaces; excels at processing high volumes of smaller parts with tight tolerances; and improves the strength and toughness of plain carbon steels. Discover why AMP’s use of Telegraf, InfluxDB, Node-RED, and Grafana allows them to gain 24/7 insights into their plant operations and metallurgical results. Learn how they use time-stamped data to gain accurate metrics about their consumables usage, furnace profiles, and machine status.
Join this webinar as Grant Pinkos dives into:
American Metal Processing's approach to heat treating in a digitized environment through connected systems
Their approach to collecting and measuring sensor data to enable predictive maintenance and improve product quality
Why they need a time series database for managing and analyzing vast amounts of time-stamped data
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
Delft University is the oldest and largest technical university in the Netherlands with 25,000+ students. Since 1999, they have had a team of students (undergraduate and graduate) designing, building, and racing cars, as part of the Formula Student worldwide competition. The competition has grown to include teams from 1K+ universities in 20+ countries. Students are responsible for all aspects of car manufacturing (research, construction, testing, developing, marketing, management, and fundraising). Delft University's team includes 90 students across disciplines.
Discover how Delft University's team uses Marple and InfluxDB to collect telemetry and sensor metrics while they develop, test, and race their electrics cars. They collect sensor data about their EV's control systems using a time series platform. During races, they are collecting IoT data about their batteries, accelerometer, gyroscope, tires, etc. The engineers are able to share important car stats during races which help the drivers tweak their driving decisions — all with the goal of winning. After races, the entire team are able to analyze data in Marple to understand what to do better next time. By using Marple + InfluxDB, their team are able to collect, share and analyze high frequency car data used to make their car faster at competitions.
Join this webinar as Robbin Baauw and Nero Vanbiervliet dive into:
Marple's approach to empowering engineers to organize, analyze, and visualize their data
Delft University's collaborative methodology to building and racing their Formula-style race car
How InfluxDB is crucial to their collaborative engineering and racing process
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
InfluxData is excited to announce the general availability of InfluxDB Cloud's new storage engine! It is a cloud-native, real-time, columnar database optimized for time series data. InfluxDB's rebuilt core was coded in Rust and sits on top of Apache Arrow and DataFusion. InfluxData's team picked Apache Parquet as the persistent format. In this webinar, Paul Dix and Balaji Palani will demonstrate key product features including the removal of cardinality limits!
They will dive into:
The next phase of the InfluxDB platform
How using Apache Arrow's ecosystem has improved InfluxDB's performance and scalability
Key features of InfluxDB Cloud's new core — including SQL native support
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
balena.io helps companies develop, deploy, update, and manage IoT devices. By using Linux containers and other cloud technologies, balena enables teams to quickly and easily build fleets of connected devices. Developers are able to use containers with the language of choice and pull IoT sensor data from 70+ different single board computers into balenaCloud. Discover how to use balena.io to automate your InfluxDB deployments at the edge!
During this one-hour session, experts from balena and InfluxData will demonstrate how to build and deploy your own air quality IoT solution. You will learn:
The fundamentals of IoT sensor deployment and management using balena.
How to use a time series platform to collect and visualize metrics from edge devices.
Tips and tricks to using balenaCloud to automate InfluxDB deployments and Telegraf configurations.
How to use InfluxDB's Edge Data Replication feature to collect sensor data and push it to InfluxDB Cloud for analysis.
No coding experience required, just a curiosity to start your own IoT adventure.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
RudderStack — the creators of the leading open source Customer Data Platform (CDP) — needed a scalable way to collect and store metrics related to customer events and processing times (down to the nanosecond). They provide their clients with data pipelines that simplify data collection from applications, websites, and SaaS platforms. RudderStack's solution enables clients to stream customer data in real time — they quickly deploy flexible data pipelines that send the data to the customer's entire stack without engineering headaches. Customers are able to stream data from any tool using their 16+ SDK's, and they are able to transform the data in-transit using JavaScript or Python. How does RudderStack use a time series platform to provide their customers with real-time analytics?
Join this webinar as Ryan McCrary dives into:
RudderStack's approach to streamlining data pipelines with their 180+ out-of-the-box integrations
Their data architecture including Kapacitor for alerting and Grafana for customized dashboards
Why using InfluxDB was crucial for them for fast data collection and providing single-sources of truths for their customers
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
Customers using ThingWorx and the Manufacturing Solutions often need to store property data longer than the Solutions default to. These customers are recommended to use InfluxDB, and this presentation will cover the key considerations for moving to InfluxDB vs the standard ThingWorx value streams. Join this session as Ward highlights ThingWorx’s solution and its easy implementation process.
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
Two new features are coming to Flux that add flexibility
and functionality to your data workflow—polymorphic
labels and dynamic types. This session walks through
these new features and shows how they work.
This document outlines the schedule for Day 2 of InfluxDays 2022, an event hosted by InfluxData. The schedule includes sessions on building developer experience, how developers like to work, an overview of the InfluxDB developer console and API, demos of client libraries and the InfluxDB v2 API, tips for getting involved in the InfluxDB community and university, use cases for networking monitoring, crypto/fintech, monitoring/observability, and IIoT, and closing thoughts. Recordings of all sessions will be made available to registered attendees by November 7th. Upcoming events include advanced Flux training in London and resources through the community forums, Slack channel, and online university.
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData
This document contains the agenda for Day 2 of InfluxDays 2022, which includes:
- Welcome and introductory remarks from Zoe Steinkamp and Jay Clifford of InfluxData.
- Fireside chats and presentations on building great developer experiences, how developers like to work, and use cases for InfluxDB from companies like Tesla, InfluxData, and others.
- Sessions on the InfluxDB developer console, APIs, client libraries, getting involved in the community, accelerating time to awesome with InfluxDB University, and tips for analyzing IoT data with InfluxDB.
- Closing thoughts from Zoe Steinkamp and Jay Clifford, as well as
The document summarizes the agenda and sessions for Day 1 of InfluxDays 2022. It includes sessions on InfluxDB data collection, scripting languages like Flux, the InfluxDB time series engine, tasks, storage, and a closing discussion. The agenda involves talks from InfluxData employees on building applications with real-time data, navigating the developer experience, solving problems, the InfluxDB platform, community, education, use cases in crypto/fintech and IIoT, and tips/tricks for analysis.
Securing BGP: Operational Strategies and Best Practices for Network Defenders...APNIC
Md. Zobair Khan,
Network Analyst and Technical Trainer at APNIC, presented 'Securing BGP: Operational Strategies and Best Practices for Network Defenders' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
EASY TUTORIAL OF HOW TO USE CiCi AI BY: FEBLESS HERNANE Febless Hernane
Cici AI simplifies tasks like writing and research with its user-friendly platform. Users sign up, input queries, customize responses, and edit content as needed. It offers efficient saving and exporting options, making it ideal for enhancing productivity through AI assistance.
10 Conversion Rate Optimization (CRO) Techniques to Boost Your Website’s Perf...Web Inspire
What is CRO?
Conversion Rate Optimization, or CRO, is the process of enhancing your website to increase the percentage of visitors who take a desired action. This could be anything from purchasing a product to signing up for a newsletter. Essentially, CRO is about making your website more effective in turning visitors into customers.
Why is CRO Important?
CRO is crucial because it directly impacts your bottom line. A higher conversion rate means more customers and revenue without needing to increase your website traffic. Plus, a well-optimized site improves user experience, which can lead to higher customer satisfaction and loyalty.
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...APNIC
Adli Wahid, Senior Internet Security Specialist at APNIC, delivered a presentation titled 'Honeypots Unveiled: Proactive Defense Tactics for Cyber Security' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
Decentralized Justice in Gaming and EsportsFederico Ast
Discover how Kleros is transforming the landscape of dispute resolution in the gaming and eSports industry through the power of decentralized justice.
This presentation, delivered by Federico Ast, CEO of Kleros, explores the innovative application of blockchain technology, crowdsourcing, and incentivized mechanisms to create fair and efficient arbitration processes.
Key Highlights:
- Introduction to Decentralized Justice: Learn about the foundational principles of Kleros and how it combines blockchain with crowdsourcing to develop a novel justice system.
- Challenges in Traditional Arbitration: Understand the limitations of conventional arbitration methods, such as high costs and long resolution times, particularly for small claims in the gaming sector.
- How Kleros Works: A step-by-step guide on the functioning of Kleros, from the initiation of a smart contract to the final decision by a jury of peers.
- Case Studies in eSports: Explore real-world scenarios where Kleros has been applied to resolve disputes in eSports, including issues like cheating, governance, player behavior, and contractual disagreements.
- Practical Implementation: Detailed walkthroughs of how disputes are handled in eSports tournaments, emphasizing speed, cost-efficiency, and fairness.
- Enhanced Transparency: The role of blockchain in providing an immutable and transparent record of proceedings, ensuring trust in the resolution process.
- Future Prospects: The potential expansion of decentralized justice mechanisms across various sectors within the gaming industry.
For more information, visit kleros.io or follow Federico Ast and Kleros on social media:
• Twitter: @federicoast
• Twitter: @kleros_io
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. ITSarthak Sobti
Network Security and Cyber Laws
Detailed Course Content
Unit 1: Introduction to Network Security
- Introduction to Network Security
- Goals of Network Security
- ISO Security Architecture
- Attacks and Categories of Attacks
- Network Security Services & Mechanisms
- Authentication Applications: Kerberos, X.509 Directory Authentication Service
Unit 2: Application Layer Security
- Security Threats and Countermeasures
- SET Protocol
- Electronic Mail Security
- Pretty Good Privacy (PGP)
- S/MIME
- Transport Layer Security: Secure Socket Layer & Transport Layer Security
- Wireless Transport Layer Security
Unit 3: IP Security and System Security
- Authentication Header
- Encapsulating Security Payloads
- System Security: Intruders, Intrusion Detection System, Viruses
- Firewall Design Principles
- Trusted Systems
- OS Security
- Program Security
Unit 4: Introduction to Cyber Law
- Cyber Crime, Cyber Criminals, Cyber Law
- Object and Scope of the IT Act: Genesis, Object, Scope of the Act
- E-Governance and IT Act 2000
- Legal Recognition of Electronic Records
- Legal Recognition of Digital Signatures
- Use of Electronic Records and Digital Signatures in Government and its Agencies
- IT Act in Detail
- Basics of Network Security: IP Addresses, Port Numbers, and Sockets
- Hiding and Tracing IP Addresses
- Scanning: Traceroute, Ping Sweeping, Port Scanning, ICMP Scanning
- Fingerprinting: Active and Passive Email
Unit 5: Advanced Attacks
- Different Kinds of Buffer Overflow Attacks: Stack Overflows, String Overflows, Heap and Integer Overflows
- Internal Attacks: Emails, Mobile Phones, Instant Messengers, FTP Uploads, Dumpster Diving, Shoulder Surfing
- DOS Attacks: Ping of Death, Teardrop, SYN Flooding, Land Attacks, Smurf Attacks, UDP Flooding
- Hybrid DOS Attacks
- Application-Specific Distributed DOS Attacks
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
OPTIMIZING THE TICK STACK
1.
2. Agenda: New Practitioners Track
WORKSHOPAGENDA
8:00 AM – 9:00 AM Breakfast
9:00 AM – 10:00 AM Installing the TICK Stack and Your First Query Noah Crowley
10:00 AM – 10:50 AM Chronograf and Dashboarding David Simmons
10:50 AM – 11:20 AM Break
11:20 AM – 12:10 PM Writing Queries (InfluxQL and TICK) Noah Crowley
12:10 PM – 1:10 PM Lunch
1:10 PM – 2:00 PM Architecting InfluxEnterprise for Success Dean Sheehan
2:00 PM – 2:10 PM Break
2:10 PM – 3:00 PM Optimizing the TICK Stack Dean Sheehan
3:10 PM – 4:00 PM Downsampling Data Michael DeSa
4:00 PM Happy Hour
3. Dean Sheehan
Senior Director, Pre and
Post Sales
Optimizing the TICK Stack
• What shape is your data?
• Ingest optimization
• Query considerations
• Offloading stream processing
5. The Line Protocol
• Self describing data
– Points are written to InfluxDB using the Line Protocol, which follows the following format:
<measurement>[,<tag-key>=<tag-value>] [<field-key>=<field-value>] [unix-nano-timestamp]
– This provides extremely high flexibility as new metrics are identified for collection into
InfluxDB. New measure to capture? Just send it to InfluxDB. It’s that easy.
cpu_load,hostname=server02,az=us_west usage_user=24.5,usage_idle=15.3 1234567890000000
Measurement
Tag
Set
Field Set
Timestamp
6. DON'T ENCODE DATA INTO THE MEASUREMENT NAME
• Measurement names like:
• Encode that information as tags:
cpu.server-5.us-west value=2 1444234982000000000
cpu.server-6.us-west value=4 1444234982000000000
mem-free.server-6.us-west value=2500 1444234982000000000
cpu,host=server-5,region=us-west value=2 1444234982000000000
cpu,host=server-6,region=us-west value=4 1444234982000000000
mem-free,host=server-6,region=us-west value=2500 1444234982000000
7. What if my plugin sends data like that to InfluxDB?
Write something that sits between your plugin and InfluxDB to sanitize the data OR
use one of our write plugins:
Example - Telegraf’s Graphite input plugin: Takes input like…
…and parses it with the following template…
…resulting in the following points in line protocol hitting the database:
sensu.metric.net.server0.eth0.rx_packets 461295119435 1444234982
sensu.metric.net.server0.eth0.tx_bytes 1093086493388480 1444234982
sensu.metric.net.server0.eth0.rx_bytes 1015633926034834 1444234982
["sensu.metric.* ..measurement.host.interface.field"]
net,host=server0,interface=eth0 rx_packets=461295119435 1444234982
net,host=server0,interface=eth0 tx_bytes=1093086493388480 1444234982
net,host=server0,interface=eth0 rx_bytes=1015633926034834 1444234982
9. DON’T OVERLOAD TAGS
• BAD
• GOOD: Separate out into different tags:
cpu,server=localhost.us-west value=2 1444234982000000000
cpu,server=localhost.us-east value=3 1444234982000000000
cpu,host=localhost,region=us-west value=2 1444234982000000000
cpu,host=localhost,region=us-east value=3 1444234982000000000
10. DON’T USE THE SAME NAME FOR A FIELD AND A TAG
• BAD: This significantly complicates queries.
• GOOD: Differentiate the names somehow:
login,user=admin user=2342,success=1 1444234982000
SELECT user::field, user::tag FROM login
login,role=admin user=2342,success=1 1444234982000
11. DON'T USE TOO FEW TAGS
• BAD
• Problems you might run into:
• Fields are not indexed, so queries with field conditions have to scan every
point.
• GROUP BY <field> is not valid, you can only GROUPBY <tag>
cpu,region=us-west host="server1",value=4,temp=2 1444234982000
cpu,region=us-west host="server2",value=1,other=14 1444234982000
12. DON'T CREATE TOO MANY LOGICAL CONTAINERS
Or rather, don’t write to too many databases:
• Dozens of databases should be fine
• hundreds might be okay
• thousands probably aren't without careful design
Too many databases leads to more open files, more query
iterators in RAM, and more shards expiring. Expiring shards have
a non-trivial RAM and CPU cost to clean up the indices.
OR MEASURMENTS
13. The Last Writes Wins!
• InfluxDB only stores one value for a given series
• ‘Given’ series meaning what?
– {Measurment,TagSet,Timestamp}
• If you send in another entry for same {M,TS,T}
– Result is union of previous and new field values
15. Internet Of Things – City Air Quality
• There are 10k sensor units that measure
• Smog, Carbon Dioxide, Lead and Sulfur Dioxide
• at different locations throughout a city
• Sensor units send measurements to Influx every 10 seconds
• IOT people like to think in Hertz, that would be 0.1Hz
16. Sensor data
• zip_code Zipcode of the sensor location
city Name of the city
lat Latitude of the sensor
lng Longitude of the sensor
device_id UUID of the device
smog_level Smog level measurement
co2_ppm CO2 parts per million
measurement
lead Atmospheric lead level
measurement
so2_level Sulfur Dioxide level
measurement
18. Solution
• Why would it be a bad idea to make lat or lng a tag instead of a
field?
– Numeric Property: We probably care about doing math on lat and lng.
That can only work if they are fields.
20. Solution
• Why would it be a good idea to make lat or lng a tag instead of a
field?
– We probably care about filtering or grouping by lat and lng. Filters are
faster with tags, and only tags are valid for grouping.
– If our devices don't move, lat and lng are dependent tags on device_id.
Storing them as tags won't increase series cardinality.
• Keep in mind that you can’t do any of the numeric computations
on tags
21. The following queries are important
SELECT median(lead) FROM pollutants
WHERE time > now() - 1d GROUP BY city
SELECT mean(co2_ppm) FROM pollutants
WHERE time > now() - 1d AND city='sf' GROUP BY device_id
SELECT max(smog_level) FROM pollutants
WHERE time > now() - 1d AND city='nyc' GROUP BY zipcode
SELECT min(so2_level) FROM pollutants
WHERE time > now() - 1d AND city='nyc' GROUP BY zipcode
23. Schema 1 for Pollutants
measurement: pollutants
tags: city device_id zipcode
fields: lat lng smog_level co2_ppm lead so2_level
Examples in Line Protocol
pollutants,
city=richmond,device_id=12,zipcode=23221
lat=37.5333,lng=77.4667,smog_level=2.4,co2_ppm=404i,lead=2.3,so2_level=3i
142309324834700
pollutants,
city=bozeman,device_id=37,zipcode=59715
lat=45.6778,lng=111.0472,smog_level=0.9,co2_ppm=398i,lead=1.3,so2_level=1i
142309324834700
24. Schema 2 for Pollutants
measurement: pollutants
tags: lat lng city device_id zipcode
fields: smog_level co2_ppm lead so2_level
Examples in Line Protocol
pollutants,
city=richmond,device_id=12,zipcode=23221,lat=37.5333,lng=77.4667
smog_level=2.4,co2_ppm=404i,lead=2.3,so2_level=3i
142309324834700
pollutants,
city=bozeman,device_id=37,zipcode=59715,lat=45.6778,lng=111.0472
smog_level=0.9,co2_ppm=398i,lead=1.3,so2_level=1i
142309324834700
25. Queries and data on disk
• Shards have a duration
• Queries that touch more shards run slower
• Look to configure a shard duration that means queries are
typically answered from few (1) shards
• We recommend configuring the shard group duration such that:
– It is two times your longest typical query’s time range
– Each shard group has at least 100,000 points
– Each shard group has at least 1,000 points per series
26. Time Series Index
• You have a choice
– Inmemory index
– Memory mapped (to disk) index – Time Series Index
• Inmemory Index has limits
– How much memory do you have?
– Needs rebuild on restart
• Time Series Index uses disk
– Little slower in some situations
– Upside is how much disk do you have compared to memory (and restart
speed)
27. Optimizing ingestion
• Batch your writes if possible
– Something like 5000 points per batch
• Be mindful of thundering heard
– Boat load of data from hundreds of sources at same second
– Jitter
29. Query considerations
• Time bound
– We’re a time series database after all
• Query large, return small
– Human consumers don’t work well with large amounts of data
– Machine consumers – maybe but the more you can ask InfluxDB to do
the better
• Use tags
– Comes back to thinking about your schema early on
• Can’t win them all
30. Offload stream processing
• You can clearly run queries periodically to look for worrying
situations
– And generate alarms in your calling code
• Be respectful of the DB engine, it is working hard to store your
data and answer important queries
• Some ‘queries’ might be better handled by observing and
operating on the stream of data InfluxDB sees
• Kapacitor ‘subscribes’ to InfluxDB
31. Other pearls
• Query limit configuration
• max-concurrent-queries
• max-select-point
• max-select-series
• Queries
• Reminder time-series queries - should always include time
• Backfilling data
• Insert data ordered by timestamp/tags - newest to oldest