Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDBInfluxData
European XFEL are the creators of the strongest x-ray beam in the world. Their 3.4-km long X-ray free-electron laser underground tunnel is used by researchers from around the world. Scientists use their facilities to map atomic details of viruses, film chemical reactions, and study the processes in the interior of planets. Discover how European XFEL uses InfluxDB to monitor their scientific experiments and research.
In this webinar, Alessandro Silenzi will dive into:
European XFEL’s approach to empowering the worldwide community to push the boundaries of science
The evolution of their data management solution — from homegrown to InfluxDB
How a time series platform is used to analyze and validate experiment data
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation efficiency. Unfortunately, state-of-the-art systems for approximate computing, such as BlinkDB, ApproxHadoop, primarily target batch analytics, where the input data remains unchanged during the course of sampling. Thus, they are not well-suited for stream analytics. In this talk, we will present the design of StreamApprox, a Flink-based stream analytics system for approximate computing. StreamApprox implements an online stratified reservoir sampling algorithm in Apache Flink to produce approximate output with rigorous error bounds.
Development and Applications of Distributed IoT Sensors for Intermittent Conn...InfluxData
What do electric power sensing IoT devices, large area electric field surveys and an array with hundreds of data channels have in common? They’re all built using an IoT stack fueled by InfluxDB and designed to run in environments of intermittent network connectivity.
In the operational environments where U.S. Soldiers operate, network connectivity is not ensured due to jamming, intermittent 4G signals, or paperwork. To address these issues, the United States Army Research Laboratory runs InfluxDB in both the cloud and on the IoT device. When connectivity is available, the most recent data are replicated to the cloud with historical data replicated as possible. This allows them to design products that can leverage the cloud, but aren’t tied to it. As a result, they have been able to develop electric power monitors for installations and microgrids, strap sensors to vehicles for large area surveys, and combine sensors into arrays.
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward
Huawei Cloud Stream Service uses Flink internally. Cloud Stream firstly uses Flink as an internal job executing engine. Kinesis use Storm, Alibaba Stream Compute service use Storm. At the end of this year we will support Flink run on the kubernetes and Mesos at cloud, also CEP on SQL and other features. The presentation will show how to create a serverless cloud service from zero, how to provide streaming features with Flink, how to operate the service with quantization and visualization (by collecting yarn/flink/os metrics in real-time). This service was developed from zero and only cost around three months.
Jorge de la Cruz [Veeam Software] | RESTful API – How to Consume, Extract, St...InfluxData
This document provides an overview of consuming, extracting, storing, and visualizing data from a RESTful API with InfluxDB and Grafana. It introduces RESTful APIs and their components, then details how to make requests using HTTP methods. Next, it covers using Bash shell scripts with JQ to parse JSON responses and send data to InfluxDB for storage. Finally, it demonstrates how to build dashboards in Grafana to visualize time series data from InfluxDB for monitoring and analytics.
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
Data stream processing has redefined how many of us build data pipelines. Apache Flink is one of the systems at the forefront of that development: With its versatile APIs (event-time streaming, Stream SQL, events/state) and powerful execution model, Flink has been part of re-defining what stream processing can do. By now, Apache Flink powers some of the largest data stream processing pipelines in open source data stream processing. In this keynote, we will look at the evolution of Stream Processing and Apache Flink during the last year, and what we believe will be the next wave of stream processing applications. We show how the Flink community and users evolved, what use cases are coming up, and how new and upcoming features in Flink are making new types of applications possible. We will also discuss common challenges that companies are facing when adopting stream processing, and how we can help companies to rapidly adopt and roll out stream processing company-wide.
InfluxDB 2.0: Dashboarding 101 by David G. SimmonsInfluxData
InfluxDB 2.0 has some new dashboarding and querying capabilities that will make using a time series database even easier. This InfluxDays NYC 2019 presentation presented by David G. Simmons (Senior Developer Evangelist at InfluxData), walks you through how to set up your first dashboard.
How a Particle Accelerator Monitors Scientific Experiments Using InfluxDBInfluxData
European XFEL are the creators of the strongest x-ray beam in the world. Their 3.4-km long X-ray free-electron laser underground tunnel is used by researchers from around the world. Scientists use their facilities to map atomic details of viruses, film chemical reactions, and study the processes in the interior of planets. Discover how European XFEL uses InfluxDB to monitor their scientific experiments and research.
In this webinar, Alessandro Silenzi will dive into:
European XFEL’s approach to empowering the worldwide community to push the boundaries of science
The evolution of their data management solution — from homegrown to InfluxDB
How a time series platform is used to analyze and validate experiment data
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation efficiency. Unfortunately, state-of-the-art systems for approximate computing, such as BlinkDB, ApproxHadoop, primarily target batch analytics, where the input data remains unchanged during the course of sampling. Thus, they are not well-suited for stream analytics. In this talk, we will present the design of StreamApprox, a Flink-based stream analytics system for approximate computing. StreamApprox implements an online stratified reservoir sampling algorithm in Apache Flink to produce approximate output with rigorous error bounds.
Development and Applications of Distributed IoT Sensors for Intermittent Conn...InfluxData
What do electric power sensing IoT devices, large area electric field surveys and an array with hundreds of data channels have in common? They’re all built using an IoT stack fueled by InfluxDB and designed to run in environments of intermittent network connectivity.
In the operational environments where U.S. Soldiers operate, network connectivity is not ensured due to jamming, intermittent 4G signals, or paperwork. To address these issues, the United States Army Research Laboratory runs InfluxDB in both the cloud and on the IoT device. When connectivity is available, the most recent data are replicated to the cloud with historical data replicated as possible. This allows them to design products that can leverage the cloud, but aren’t tied to it. As a result, they have been able to develop electric power monitors for installations and microgrids, strap sensors to vehicles for large area surveys, and combine sensors into arrays.
Flink Forward Berlin 2017: Dr. Radu Tudoran - Huawei Cloud Stream Service in ...Flink Forward
Huawei Cloud Stream Service uses Flink internally. Cloud Stream firstly uses Flink as an internal job executing engine. Kinesis use Storm, Alibaba Stream Compute service use Storm. At the end of this year we will support Flink run on the kubernetes and Mesos at cloud, also CEP on SQL and other features. The presentation will show how to create a serverless cloud service from zero, how to provide streaming features with Flink, how to operate the service with quantization and visualization (by collecting yarn/flink/os metrics in real-time). This service was developed from zero and only cost around three months.
Jorge de la Cruz [Veeam Software] | RESTful API – How to Consume, Extract, St...InfluxData
This document provides an overview of consuming, extracting, storing, and visualizing data from a RESTful API with InfluxDB and Grafana. It introduces RESTful APIs and their components, then details how to make requests using HTTP methods. Next, it covers using Bash shell scripts with JQ to parse JSON responses and send data to InfluxDB for storage. Finally, it demonstrates how to build dashboards in Grafana to visualize time series data from InfluxDB for monitoring and analytics.
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
Data stream processing has redefined how many of us build data pipelines. Apache Flink is one of the systems at the forefront of that development: With its versatile APIs (event-time streaming, Stream SQL, events/state) and powerful execution model, Flink has been part of re-defining what stream processing can do. By now, Apache Flink powers some of the largest data stream processing pipelines in open source data stream processing. In this keynote, we will look at the evolution of Stream Processing and Apache Flink during the last year, and what we believe will be the next wave of stream processing applications. We show how the Flink community and users evolved, what use cases are coming up, and how new and upcoming features in Flink are making new types of applications possible. We will also discuss common challenges that companies are facing when adopting stream processing, and how we can help companies to rapidly adopt and roll out stream processing company-wide.
InfluxDB 2.0: Dashboarding 101 by David G. SimmonsInfluxData
InfluxDB 2.0 has some new dashboarding and querying capabilities that will make using a time series database even easier. This InfluxDays NYC 2019 presentation presented by David G. Simmons (Senior Developer Evangelist at InfluxData), walks you through how to set up your first dashboard.
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
Why Architecting for Disaster Recovery is Important for Your Time Series Data...InfluxData
Time Series data at Capital One consists of Infrastructure, Application, and Business Process Metrics. The combination of these metrics are what the internal stakeholders rely on for observability which allows them to deliver better service and uptime for their customers, so protecting this critical data with a proven and tested recovery plan is not a “nice to have” but a “must have.”
In this talk, the members of IT staff, Saravanan Krisharaju, Rajeev Tomer, and Karl Daman will share how they built a fault-tolerant solution based on InfluxEnterprise and AWS that collects and stores metrics and events. They added to this, Machine Learning, which uses the collected time series to model predictions which are then brought back into InfluxDB time series database for real-time access. This Capital One team shares the journey they took to architect and build this solution as well as plan and execute on their disaster recovery plan.
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData
In this InfluxDays NYC 2019 talk by Gunnar Aasen (Manager of Partner Engineering at InfluxData), you will get an overview of the AWS Container Monitoring Stack as well as how you can use InfluxDB on AWS for container monitoring. This session will include a demo of the solution.
Keystone Data Pipeline manages several thousand Flink pipelines, with variable workloads. These pipelines are simple routers which consume from Kafka and write to one of three sinks. In order to alleviate our operational overhead, we’ve implemented autoscaling for our routers. Autoscaling has reduced our resource usage by 25% - 45% (varying by region and time), and has reduced our on call burden. This talk will take an in depth look at the mathematics, algorithms, and infrastructure details for implementing autoscaling of simple pipelines at scale. It will also discuss future work for autoscaling complex pipelines.
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
Dean will provide practical tips and techniques learned from helping hundreds of customers deploy InfluxDB and InfluxDB Enterprise. This includes hardware and architecture choices, schema design, configuration setup, and running queries.
Why building a big data platform is hard? What are the key aspects involved in providing a "Serverless" experience for data folks. And how Databricks solves infrastructure problems and provides the "Serverless" experience.
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
Data stream processing is, for many of us, a new paradigm with which you process data and build applications. In this talk, we will take you on a journey through the theoretical foundations of stream processing and discuss the underlying principles and unique problems that need to be addressed. What actually is a data stream anyway? And how do I use it? How do streams relate to application state and when do I use the one or the other?
ksqlDB and Kafka Streams are both, at their core, designed to help build stream processing applications and we will explain how stream processing principles are reflected in the design of each system and what trade-offs were chosen (and - more importantly! - why). Finally, we take a look into the future how the stream processing space, and in particular ksqlDB and Kafka Streams, may evolve over the next few years as we outline extensions and improvements to the underlying conceptual model. So, bring your thinking hats and notepads and prepare to learn WHY these systems are the way they are!
Flink Forward SF 2017: James Malone - Make The Cloud Work For YouFlink Forward
You should spend your time using the powerful Apache Flink ecosystem to get value from your data, not on your data processing infrastructure. Cloud environments can help you with this problem by providing managed services and infrastructure. Since Google Cloud Dataproc, Google's managed service to power the Apache big data ecosystem, runs Flink, you can easily combine the benefits of cloud with your Flink data pipelines. With new support for Flink and long-running streaming jobs, we will show you how you can set up a cluster and a streaming job in less than three minutes.
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
This document discusses analyzing large genomic datasets with ADAM and Toil. It summarizes the sequencing and analysis process, and how ADAM implemented on Spark can provide horizontal scalability and speedups of 30-50x over traditional tools. Toil is introduced as a pipeline system for massive genomic workflows that can run on thousands of nodes and is resilient to failures. Results show ADAM produces equivalent variants to GATK while being 3.5x faster and 4x cheaper.
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
DTW: Dynamic Time Warping is a well-known method to find patterns within a time-series. It has the possibility to find a pattern even if the data are distorted. It can be used to detect trends in sell, defect in machine signals in the industry, medicine for electro-cardiograms, DNA…
Most of the implementations are usually very slow, but a very efficient open source implementation (best paper SIGKDD 2012) is implemented in C. It can be easily ported in other language, as Java, so that it can be then easily used in Flink.
We present how we did some slight modifications so that we can use with Flink at even greater scale to return the TopK best matches on past data or streaming data.
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
This talk was given by Swaroop Jagadish (Staff Software Engineer @ LinkedIn) at the ACM SIGMOD/PODS Conference (June 2013). For the paper written by the LinkedIn Espresso Team, go here:
http://www.slideshare.net/amywtang/espresso-20952131
In Data Engineer’s Lunch #41: Pygrametl , we discussed PygramETL, a python ETL tool in order to close out our series on them.
Accompanying Blog: https://blog.anant.us/data-engineers-lunch-41-pygrametl
Accompanying YouTube: https://youtu.be/YiPuJyYLXxs
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...Flink Forward
High-throughput DNA sequencing is a key data acquisition technology which enables dozens of important applications, from oncology to personalized diagnostics. We extended work presented last year to port additional portions of the standard genomics data processing pipeline to Flink. Our Flink-based processor consists of two distinct specialized modules (reader and writer) that are loosely linked via Kafka streams, thus allowing for easy composability and integration into already existing Hadoop workflows. To extend our work we had to manage the dynamical creation and detection of the data streams: the set of output files is not known in advance by the writer, which learns it at running time. Particular care had to be taken to handle the finite nature of the genomic streams: since we use some already existing Hadoop output formats, we had to properly handle the flow of end-of-streams markers through Flink and Kafka, in order to have the final output files correctly finalized.
What are algorithms? How can I build a machine learning model? In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. At Amazon, we created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorisation machines for recommendations, and time-series forecasting. This talk will discuss those algorithms, understand where and how they can be used, and our design choices.
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData
In this InfluxDays NYC 2019 session, Richard Laskey from the Wayfair Storefront team will share their monitoring best practices using InfluxEnterprise. These efforts are critical and help improve the user experience by driving forward site-wide improvements, establishing best practices, and driving change through many different teams.
COOL WAYS TO GET STARTED
Join us for a live InfluxDB training to learn how to easily ingest at scale in a matter of seconds to help you build powerful time series based applications. Join our 45-minute demos with experts who will showcase key InfluxDB features and answer questions live from the audience.
After attending this training, attendees will be able to:
Use sample data sets to try out various visualization options
Utilize the available data ingestion methods to construct a data pipeline to InfluxDB
Leverage Notebooks to collaborate with team members
Gain best practices for InfluxDB, Telegraf and Flux
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
This document provides an introduction to open source observability tools including Grafana, Prometheus, Loki, and Tempo. It summarizes each tool and how they work together. Prometheus is introduced as a time series database that collects metrics. Loki is described as a log aggregation system that handles logs at scale without high costs. Tempo is explained as a tracing system that allows tracing from logs, metrics, and between services. The document emphasizes that these tools can be run together to gain observability across an entire system from logs to metrics to traces.
Some notes about spark streming positioning give the current players: Beam, Flink, Storm et al. Helpful if you have to choose an Streaming engine for your project.
Stateful stream processing with Apache FlinkKnoldus Inc.
Nowadays, many stream processing applications have sophisticated business logic, strict correctness guarantees, high performance, low latency, fault-tolerant, and maintain terabytes of state. There are many stream processing frameworks available in the market which helps businesses to write robust stateful stream processing applications.
In this session, we will talk about Apache Flink, a distributed stream processor with intuitive and expressive APIs to implement stateful stream processing applications. It can efficiently run such applications at a large scale in a fault-tolerant manner. In this session, we will see what is stateful stream processing in detail, and how Flink takes on stateful stream processing. We'll get to know how checkpointing mechanism works in Flink.
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
Accelerating Astronomical Discoveries with Apache SparkDatabricks
Our research group is investigating how to leverage Apache Spark (batch, streaming & real-time) to analyse current and future data sets in astronomy. Among the future large experiments, the Large Synoptic Survey Telescope (LSST) will start soon collecting terabytes of data per observation night, and the efficient processing and analysis of both real-time and historical data remains a major challenge. In this talk we will expose the main challenges and explore the latest developments tailored for big data problems in astronomy.
On the one hand we designed a new Data Source API extension to natively manipulate telescope images and astronomical tables within Apache Spark. We then extended the functionalities of the Apache Spark SQL module to ease the manipulation of 3D data sets and perform efficient queries: partitioning, data sets join and cross-match, nearest neighbors search, spatial queries, and more.
On the other hand we are using the new possibilities offered by Structured Streaming APIs in recent Apache Spark versions to enable real-time decisions by rapidly accessing and analysing the alerts sent by telescopes every night. Given the unprecedented precision of next generation of telescopes, the streams of alerts will be made of millions of alerts per night, and relying on Structured Streaming is a guarantee of not missing the latest Black Hole event in a sea of data! We will also share active learning developments used on top to improve real-time event selection and classification for the LSST telescope.
You will walk away with an understanding of modern challenges in astronomy, appreciate some beautiful night skies, and how Apache Spark can help pushing further the frontiers of Science!
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
Why Architecting for Disaster Recovery is Important for Your Time Series Data...InfluxData
Time Series data at Capital One consists of Infrastructure, Application, and Business Process Metrics. The combination of these metrics are what the internal stakeholders rely on for observability which allows them to deliver better service and uptime for their customers, so protecting this critical data with a proven and tested recovery plan is not a “nice to have” but a “must have.”
In this talk, the members of IT staff, Saravanan Krisharaju, Rajeev Tomer, and Karl Daman will share how they built a fault-tolerant solution based on InfluxEnterprise and AWS that collects and stores metrics and events. They added to this, Machine Learning, which uses the collected time series to model predictions which are then brought back into InfluxDB time series database for real-time access. This Capital One team shares the journey they took to architect and build this solution as well as plan and execute on their disaster recovery plan.
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar AasenInfluxData
In this InfluxDays NYC 2019 talk by Gunnar Aasen (Manager of Partner Engineering at InfluxData), you will get an overview of the AWS Container Monitoring Stack as well as how you can use InfluxDB on AWS for container monitoring. This session will include a demo of the solution.
Keystone Data Pipeline manages several thousand Flink pipelines, with variable workloads. These pipelines are simple routers which consume from Kafka and write to one of three sinks. In order to alleviate our operational overhead, we’ve implemented autoscaling for our routers. Autoscaling has reduced our resource usage by 25% - 45% (varying by region and time), and has reduced our on call burden. This talk will take an in depth look at the mathematics, algorithms, and infrastructure details for implementing autoscaling of simple pipelines at scale. It will also discuss future work for autoscaling complex pipelines.
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
Dean will provide practical tips and techniques learned from helping hundreds of customers deploy InfluxDB and InfluxDB Enterprise. This includes hardware and architecture choices, schema design, configuration setup, and running queries.
Why building a big data platform is hard? What are the key aspects involved in providing a "Serverless" experience for data folks. And how Databricks solves infrastructure problems and provides the "Serverless" experience.
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
Data stream processing is, for many of us, a new paradigm with which you process data and build applications. In this talk, we will take you on a journey through the theoretical foundations of stream processing and discuss the underlying principles and unique problems that need to be addressed. What actually is a data stream anyway? And how do I use it? How do streams relate to application state and when do I use the one or the other?
ksqlDB and Kafka Streams are both, at their core, designed to help build stream processing applications and we will explain how stream processing principles are reflected in the design of each system and what trade-offs were chosen (and - more importantly! - why). Finally, we take a look into the future how the stream processing space, and in particular ksqlDB and Kafka Streams, may evolve over the next few years as we outline extensions and improvements to the underlying conceptual model. So, bring your thinking hats and notepads and prepare to learn WHY these systems are the way they are!
Flink Forward SF 2017: James Malone - Make The Cloud Work For YouFlink Forward
You should spend your time using the powerful Apache Flink ecosystem to get value from your data, not on your data processing infrastructure. Cloud environments can help you with this problem by providing managed services and infrastructure. Since Google Cloud Dataproc, Google's managed service to power the Apache big data ecosystem, runs Flink, you can easily combine the benefits of cloud with your Flink data pipelines. With new support for Flink and long-running streaming jobs, we will show you how you can set up a cluster and a streaming job in less than three minutes.
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
This document discusses analyzing large genomic datasets with ADAM and Toil. It summarizes the sequencing and analysis process, and how ADAM implemented on Spark can provide horizontal scalability and speedups of 30-50x over traditional tools. Toil is introduced as a pipeline system for massive genomic workflows that can run on thousands of nodes and is resilient to failures. Results show ADAM produces equivalent variants to GATK while being 3.5x faster and 4x cheaper.
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
DTW: Dynamic Time Warping is a well-known method to find patterns within a time-series. It has the possibility to find a pattern even if the data are distorted. It can be used to detect trends in sell, defect in machine signals in the industry, medicine for electro-cardiograms, DNA…
Most of the implementations are usually very slow, but a very efficient open source implementation (best paper SIGKDD 2012) is implemented in C. It can be easily ported in other language, as Java, so that it can be then easily used in Flink.
We present how we did some slight modifications so that we can use with Flink at even greater scale to return the TopK best matches on past data or streaming data.
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
This talk was given by Swaroop Jagadish (Staff Software Engineer @ LinkedIn) at the ACM SIGMOD/PODS Conference (June 2013). For the paper written by the LinkedIn Espresso Team, go here:
http://www.slideshare.net/amywtang/espresso-20952131
In Data Engineer’s Lunch #41: Pygrametl , we discussed PygramETL, a python ETL tool in order to close out our series on them.
Accompanying Blog: https://blog.anant.us/data-engineers-lunch-41-pygrametl
Accompanying YouTube: https://youtu.be/YiPuJyYLXxs
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...Flink Forward
High-throughput DNA sequencing is a key data acquisition technology which enables dozens of important applications, from oncology to personalized diagnostics. We extended work presented last year to port additional portions of the standard genomics data processing pipeline to Flink. Our Flink-based processor consists of two distinct specialized modules (reader and writer) that are loosely linked via Kafka streams, thus allowing for easy composability and integration into already existing Hadoop workflows. To extend our work we had to manage the dynamical creation and detection of the data streams: the set of output files is not known in advance by the writer, which learns it at running time. Particular care had to be taken to handle the finite nature of the genomic streams: since we use some already existing Hadoop output formats, we had to properly handle the flow of end-of-streams markers through Flink and Kafka, in order to have the final output files correctly finalized.
What are algorithms? How can I build a machine learning model? In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. At Amazon, we created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorisation machines for recommendations, and time-series forecasting. This talk will discuss those algorithms, understand where and how they can be used, and our design choices.
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData
In this InfluxDays NYC 2019 session, Richard Laskey from the Wayfair Storefront team will share their monitoring best practices using InfluxEnterprise. These efforts are critical and help improve the user experience by driving forward site-wide improvements, establishing best practices, and driving change through many different teams.
COOL WAYS TO GET STARTED
Join us for a live InfluxDB training to learn how to easily ingest at scale in a matter of seconds to help you build powerful time series based applications. Join our 45-minute demos with experts who will showcase key InfluxDB features and answer questions live from the audience.
After attending this training, attendees will be able to:
Use sample data sets to try out various visualization options
Utilize the available data ingestion methods to construct a data pipeline to InfluxDB
Leverage Notebooks to collaborate with team members
Gain best practices for InfluxDB, Telegraf and Flux
Intro to open source observability with grafana, prometheus, loki, and tempo(...LibbySchulze
This document provides an introduction to open source observability tools including Grafana, Prometheus, Loki, and Tempo. It summarizes each tool and how they work together. Prometheus is introduced as a time series database that collects metrics. Loki is described as a log aggregation system that handles logs at scale without high costs. Tempo is explained as a tracing system that allows tracing from logs, metrics, and between services. The document emphasizes that these tools can be run together to gain observability across an entire system from logs to metrics to traces.
Some notes about spark streming positioning give the current players: Beam, Flink, Storm et al. Helpful if you have to choose an Streaming engine for your project.
Stateful stream processing with Apache FlinkKnoldus Inc.
Nowadays, many stream processing applications have sophisticated business logic, strict correctness guarantees, high performance, low latency, fault-tolerant, and maintain terabytes of state. There are many stream processing frameworks available in the market which helps businesses to write robust stateful stream processing applications.
In this session, we will talk about Apache Flink, a distributed stream processor with intuitive and expressive APIs to implement stateful stream processing applications. It can efficiently run such applications at a large scale in a fault-tolerant manner. In this session, we will see what is stateful stream processing in detail, and how Flink takes on stateful stream processing. We'll get to know how checkpointing mechanism works in Flink.
Similar to Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
Accelerating Astronomical Discoveries with Apache SparkDatabricks
Our research group is investigating how to leverage Apache Spark (batch, streaming & real-time) to analyse current and future data sets in astronomy. Among the future large experiments, the Large Synoptic Survey Telescope (LSST) will start soon collecting terabytes of data per observation night, and the efficient processing and analysis of both real-time and historical data remains a major challenge. In this talk we will expose the main challenges and explore the latest developments tailored for big data problems in astronomy.
On the one hand we designed a new Data Source API extension to natively manipulate telescope images and astronomical tables within Apache Spark. We then extended the functionalities of the Apache Spark SQL module to ease the manipulation of 3D data sets and perform efficient queries: partitioning, data sets join and cross-match, nearest neighbors search, spatial queries, and more.
On the other hand we are using the new possibilities offered by Structured Streaming APIs in recent Apache Spark versions to enable real-time decisions by rapidly accessing and analysing the alerts sent by telescopes every night. Given the unprecedented precision of next generation of telescopes, the streams of alerts will be made of millions of alerts per night, and relying on Structured Streaming is a guarantee of not missing the latest Black Hole event in a sea of data! We will also share active learning developments used on top to improve real-time event selection and classification for the LSST telescope.
You will walk away with an understanding of modern challenges in astronomy, appreciate some beautiful night skies, and how Apache Spark can help pushing further the frontiers of Science!
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Frank Wuerthwein
- The document describes running a GPU burst simulation for IceCube astrophysics research across 50,000 NVIDIA GPUs in multiple cloud platforms globally, achieving 350 petaflops for 2 hours.
- IceCube detects high-energy neutrinos to study violent astrophysical events by observing the interactions of neutrinos within a cubic kilometer of Antarctic ice instrumented with sensors.
- The GPU burst simulation campaign helped improve IceCube's ability to reconstruct neutrino direction and energy and identify astrophysical sources through multi-messenger astrophysics.
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
NRP Engagement webinar: Description of the 380 PFLOP32S , 51k GPU multi-cloud burst using HTCondor to run IceCube photon propagation simulation.
Presented January 27th, 2020.
This is the keynote talk fkw gave at cloudnet 2020. It covers all three cloudbursts we did. As of early 2021, slides 26ff is still the most detailed documentation of the 3rd cloudburst. This material will be covered in a future conference paper.
In this video from ChefConf 2014 in San Francisco, Cycle Computing CEO Jason Stowe outlines the biggest challenge facing us today, Climate Change, and suggests how Cloud HPC can help find a solution, including ideas around Climate Engineering, and Renewable Energy.
"As proof points, Jason uses three use cases from Cycle Computing customers, including from companies like HGST (a Western Digital Company), Aerospace Corporation, Novartis, and the University of Southern California. It’s clear that with these new tools that leverage both Cloud Computing, and HPC – the power of Cloud HPC enables researchers, and designers to ask the right questions, to help them find better answers, faster. This all delivers a more powerful future, and means to solving these really difficult problems."
Watch the video presentation: http://insidehpc.com/2014/09/video-hpc-cluster-computing-64-156000-cores/
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
These are the slides I used for a crash course (4 hours) on data streaming. It contains both theory / research aspects as well as examples based on Apache Flink (DataStream API)
This document summarizes a lecture on file systems and performance. It discusses the read/write process for magnetic disks involving seek time, rotational latency, and transfer time. Typical numbers for these parameters in magnetic disks are provided. Flash/SSD memory is also discussed as an alternative storage technology with advantages like low latency, no moving parts, and high throughput but also drawbacks like limited endurance. The document introduces concepts from queueing theory that can help analyze the performance of I/O systems, like modeling request arrival and service times as probabilistic distributions. Key metrics like response time and throughput are discussed for evaluating I/O performance.
"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein
This talk, describing the "Largest Cloud Simulation in History" (Jensen Huang at SC19), was given at the MAGIC meeting on Dec. 4th 2019. MAGIC stands for "Middleware and Grid Interagency Cooperation", and is a group within NITRD. Current federal agencies that are members of MAGIC include DOC, DOD, DOE, HHS, NASA, and NSF.
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...NETWAYS
How to store billions of time series points and access them within a few milliseconds? Chronix!
Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
Chronix: A fast and efficient time series storage based on Apache SolrFlorian Lautenschlager
Chronix is a fast and efficient time series storage system based on Apache Solr. It can store large amounts of time-correlated data objects, like 68 billion data objects from sensor data collected over a year, using only 32GB of disk space and retrieving data within milliseconds. It achieves this through compressing time series data into chunks and storing the compressed chunks and associated attributes in records within Apache Solr. Chronix provides specialized time series aggregations and analyses through its query language to enable common time series operations like aggregations, trend analysis, and outlier detection.
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
The next decade promises to be exciting for both astronomy and computer science with a number of large-scale astronomical surveys in preparation. One of the most important ones is Large Scale Survey Telescope, or LSST. LSST will produce the first ‘video’ of the deep sky in history by continually scanning the visible sky and taking one 3.2 giga-pixel image every 20 seconds. In this talk we will describe LSST’s unique design and how its image processing pipeline produces catalogs of astronomical objects. To process and quickly cross-match catalog data we built AXS (Astronomy Extensions for Spark), a system based on Apache Spark. We will explain its design and what is behind its great cross-matching performance.
Solar System Processing with LSST: A Status UpdateMario Juric
An update for the LSST Solar System Science Collaboration on the work in progress on data products and software needed to support the Solar System science. Delivered at DPS 2017 meeting.
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
- IceCube is a neutrino observatory that detects high-energy neutrinos from astrophysical sources to study violent cosmic events. It uses over 5000 optical sensors buried in Antarctic ice to detect neutrinos.
- A cloud burst was performed using over 50,000 GPUs across multiple cloud providers worldwide to simulate photon propagation through ice for IceCube data analysis. This was the largest cloud simulation ever and demonstrated the ability to burst at exascale scales.
- The simulation helped improve IceCube's neutrino detection and pointing resolution to identify the first known source of high-energy neutrinos, a blazar, demonstrating IceCube's potential for multi-messenger astrophysics.
Round Table Introduction: Analytics on 100 TB+ catalogsMario Juric
Introductory slides to spark the discussion at the MSDSE 2017 round table on tools enabling data management and analytics of 10-100 TB catalogs, using a specific astronomy problem as a case study.
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
The Matsu Project is an Open Cloud Consortium project that is developing open source software for processing satellite imagery data using Hadoop, OpenStack and R.
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
Chronix is a time series database that can efficiently store billions of time series data points in a small amount of disk space and retrieve data within milliseconds. It works by splitting time series into fixed-size chunks, compressing the chunks, and storing the compressed chunks and associated metadata in Solr/Lucene records. Chronix provides common time series aggregations, transformations, and analyses through its API. The developers tuned Chronix's performance by evaluating different compression techniques and chunk sizes on real-world time series data. Chronix outperformed other time series databases in storage needs and query speeds in their tests.
Chronix is a time series database that can efficiently store billions of time series data points in a small amount of disk space and retrieve data within milliseconds. It works by splitting time series into fixed-size chunks, compressing the chunks, and storing the compressed chunks and associated metadata in Solr/Lucene records. Chronix provides common time series aggregations, transformations, and analyses through its API. The developers tuned Chronix's performance by evaluating different compression techniques and chunk sizes on real-world datasets. Chronix outperformed other time series databases in storage needs, query speed, and memory usage in their tests.
Similar to Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020 (20)
InfluxData is excited to announce InfluxDB Clustered, the self-managed version of InfluxDB 3.0 with unparalleled flexibility, speed, performance, and scale. The evolution of InfluxDB Enterprise, InfluxDB Clustered is delivered as a collection of Kubernetes-based containers and services, which enables you to run and operate InfluxDB 3.0 where you need it, whether that's on-premises or in a private cloud environment. With this new enterprise offering, we’re excited to provide our customers with real-time queries, low-cost object storage, unlimited cardinality, and SQL language support – all with improved data access, support, and security! The newest version of InfluxDB was built on Apache Arrow, and through the open source ecosystem and integrations, extends the value of your time-stamped data.
Join this webinar to learn more about InfluxDB Clustered, and how to manage your large mission-critical workloads in the highly available database service offering!
In this webinar, Balaji Palani and Gunnar Aasen will dive into:
Key features of the new InfluxDB Clustered solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
Apache Arrow is an open source project intended to provide a standardized columnar memory format for flat and hierarchical data. It enables more efficient analytics workloads for modern CPU and GPU hardware, which makes working with large data sets easier and cheaper.
InfluxData and Dremio are both members of the Apache Software Foundation (ASF). Dremio is a data lakehouse management service known for its scalability and capacity for direct querying across diverse data sources. InfluxDB is the purpose-built time series database, and InfluxDB 3.0 has a new columnar storage engine and uses the Arrow format for representing data and moving data to and from Parquet. Discover how InfluxDB and Dremio have advanced their solutions by relying on the Apache Arrow framework.
Join this live panel as Alex Merced and Anais Dotis-Georgiou dive into:
Advantages to utilizing the Apache Arrow ecosystem
Tips and tricks for implementing the columnar data structure
How developers can best utilize the ASF to innovate and contribute to new industry standards
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
Bevi are the creators of smart water dispensers which empower people to choose their desired beverage — flat or sparkling, their desired flavor and temperature. Since 2014, Bevi users have saved more than 350 million bottles and cans. Their "smart" water coolers have prevented the extraction of 1.4 trillion oz of oil from Earth and have saved 21.7 billion grams of CO2 from the atmosphere.
Discover how Bevi uses a time series database to enable better predictive maintenance and alerting of their entire ecosystem — including the hardware and software. They are using InfluxDB to collect sensor data in real-time remotely from their internet-connected machines about their status and activity — i.e., flavor and CO2 levels, water temp, filter status, etc. They a7re using these metrics to improve their customer experience and continuously improve their sustainability practices. Gain tips and tricks on how to best utilize InfluxDB's schema-less design.
Join this webinar as Spencer Gagnon dives into:
Bevi's approach to reducing organizations' carbon footprint — they are saving 50K+ bottles and cans annually
Their entire system architecture — including InfluxDB Cloud, Grafana, Kafka, and DigitalOcean
The importance of using time-stamped data to extend the life of their machines
Power Your Predictive Analytics with InfluxDBInfluxData
If you're using InfluxDB to store and manage your time series data, you're already off to a great start. But why stop there? In our upcoming webinar, we'll show you how to take your data analysis to the next level by building predictive analytics using a variety of tools and techniques.
We will demonstrate how to use Quix to create custom dashboards and visualizations that allow you to monitor your data in real-time. We'll also introduce you to Hugging Face, a powerful tool for building models that can predict future trends and identify anomalies. With these tools at your disposal, you'll be able to extract valuable insights from your data and make more informed decisions about the future. Don't miss out on this opportunity to improve your data analysis skills and take your business to the next level!
What you will learn:
Use InfluxDB to store and manage time series data
Utilize Quix and Hugging Face to build models, visualize trends, and identify anomalies
Extract valuable insights from your data
Improve your data analysis skills to make informed decision
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
Are you considering replacing your legacy data historian and moving your OT data to the cloud? Join this technical webinar to learn how to adopt InfluxDB and IO Base - a digital platform used to improve operational efficiencies!
Teréga Solutions are the creators of digital solutions used to improve energy efficiencies and to address decarbonization challenges. Their network includes 5,000+ km of gas pipelines within France; they aim to help France attain carbon neutrality by 2050. With these impressive goals in mind, Teréga has created IO-Base — the digital platform to improve industrial performance, and increase profitability. Creating digital twins for their clients allows them to collect data from all production sites and view it in real time, from anywhere and at any time.
Discover how Teréga uses InfluxDB, Docker, and AWS to monitor its gas and hydrogen pipeline infrastructure. They chose to replace their legacy data historian with InfluxDB — the purpose built time series database. They are collecting more than 100K different metrics at various frequencies — some are collected every 5 seconds to only every 1-2 minutes. THey have reduced overall IT spend by 50% and collect 2x the amount of data at 20x frequency! By using various industrial protocols (Modbus, OPC-UA, etc.), Teréga improved output, reduced the TCO, and is now able to create added-value services: forecast, monitoring, predictive maintenance.
Join this webinar as Thomas Delquié dives into:
Teréga's approach to modernizing fossil fuel pipelines IT systems while improving yields and safety
Their centralized methodology to collecting sensor, hardware, and network metrics
The importance of time series data and why they chose InfluxDB
Build an Edge-to-Cloud Solution with the MING StackInfluxData
FlowForge enables organizations to reliably deliver Node-RED applications in a continuous, collaborative, and secure manner. Node-RED is the popular, low-code programming solution that makes it easy to connect different services using a visual programming environment. InfluxData is the creator of InfluxDB, the purpose-built time series database run by developers at scale and in any environment in the cloud, on-premises, or at the edge.
Jump-start monitoring your industrial IoT devices and discover how to build an edge-to-cloud solution with the MING stack. The MING stack includes Mosquitto/MQTT, InfluxDB, Node-RED, and Grafana. This solution can be used to improve fleet management, enable predictive maintenance of industrial machines and power generation equipment (i.e. turbines and generators) and increase safety practices (i.e. buildings, construction sites). Join this webinar to learn best practices from industrial IoT SME's.
In this webinar, Robert Marcer and Jay Clifford dive into:
Best practices for monitoring sensor data collected by everyone — from the edge to the factory
Tips and tricks for using Node-RED and InfluxDB together
Demo — see Node-RED and InfluxDB live
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
The document is an agenda for a discussion between the CTO and founder of Ockam, Mrinal Wadhwa, and the CTO and founder of InfluxData, Paul Dix, about rewriting products using the Rust programming language. It includes an introduction of the founders, an overview of the discussion topics like why they decided to rewrite in Rust and the challenges they faced, how they got their engineers comfortable with Rust, tips they learned in the process, benefits gained from moving to Rust, and how their communities responded to the switch.
InfluxData is excited to announce the general availability of InfluxDB Cloud Dedicated! It is a fully managed time series database service running on cloud infrastructure resources that are dedicated to a single tenant. With this new offering, we’re excited to provide our customers with additional security options, and more custom configuration options to best suit customers’ workload requirements. Join this webinar to learn more about InfluxDB Cloud, and the new dedicated database service offering!
In this webinar, Balaji Palani and Gary Fowler will dive into:
Key features of the new InfluxDB Cloud Dedicated solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
Many developers and DevOps engineers have become aware of using their observability data to gain greater insights into their infrastructure systems. InfluxDB is the purpose-built time series database used to collect metrics and gain observability into apps, servers, containers, and networks. Developers use InfluxDB to improve the quality and efficiency of their CI/CD pipelines. Start using InfluxDB to aggregate infrastructure and application performance monitoring metrics to enable better anomaly detection, root-cause analysis, and alerting.
This session will demonstrate how to record metrics, logs, and traces with one library — OpenTelemetry — and store them in one open source time series database — InfluxDB. Zoe will demonstrate how easy it is to set up the OpenTelemetry Operator for Kubernetes and to store and analyze your data in InfluxDB.
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
American Metal Processing Company ("AMP") is the US' largest commercial rotary heat treat facility with customers in the automotive, construction, military, and agriculture industries. They use their atmosphere-protected rotary retort furnaces to provide their clients with three primary hardening services: neutral hardening (quench and temper), carburizing, and carbonitriding.
This furnace style ensures consistent, uniform heat treatment process vs. traditional batch-or-belt-style furnaces; excels at processing high volumes of smaller parts with tight tolerances; and improves the strength and toughness of plain carbon steels. Discover why AMP’s use of Telegraf, InfluxDB, Node-RED, and Grafana allows them to gain 24/7 insights into their plant operations and metallurgical results. Learn how they use time-stamped data to gain accurate metrics about their consumables usage, furnace profiles, and machine status.
Join this webinar as Grant Pinkos dives into:
American Metal Processing's approach to heat treating in a digitized environment through connected systems
Their approach to collecting and measuring sensor data to enable predictive maintenance and improve product quality
Why they need a time series database for managing and analyzing vast amounts of time-stamped data
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
Delft University is the oldest and largest technical university in the Netherlands with 25,000+ students. Since 1999, they have had a team of students (undergraduate and graduate) designing, building, and racing cars, as part of the Formula Student worldwide competition. The competition has grown to include teams from 1K+ universities in 20+ countries. Students are responsible for all aspects of car manufacturing (research, construction, testing, developing, marketing, management, and fundraising). Delft University's team includes 90 students across disciplines.
Discover how Delft University's team uses Marple and InfluxDB to collect telemetry and sensor metrics while they develop, test, and race their electrics cars. They collect sensor data about their EV's control systems using a time series platform. During races, they are collecting IoT data about their batteries, accelerometer, gyroscope, tires, etc. The engineers are able to share important car stats during races which help the drivers tweak their driving decisions — all with the goal of winning. After races, the entire team are able to analyze data in Marple to understand what to do better next time. By using Marple + InfluxDB, their team are able to collect, share and analyze high frequency car data used to make their car faster at competitions.
Join this webinar as Robbin Baauw and Nero Vanbiervliet dive into:
Marple's approach to empowering engineers to organize, analyze, and visualize their data
Delft University's collaborative methodology to building and racing their Formula-style race car
How InfluxDB is crucial to their collaborative engineering and racing process
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
InfluxData is excited to announce the general availability of InfluxDB Cloud's new storage engine! It is a cloud-native, real-time, columnar database optimized for time series data. InfluxDB's rebuilt core was coded in Rust and sits on top of Apache Arrow and DataFusion. InfluxData's team picked Apache Parquet as the persistent format. In this webinar, Paul Dix and Balaji Palani will demonstrate key product features including the removal of cardinality limits!
They will dive into:
The next phase of the InfluxDB platform
How using Apache Arrow's ecosystem has improved InfluxDB's performance and scalability
Key features of InfluxDB Cloud's new core — including SQL native support
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
balena.io helps companies develop, deploy, update, and manage IoT devices. By using Linux containers and other cloud technologies, balena enables teams to quickly and easily build fleets of connected devices. Developers are able to use containers with the language of choice and pull IoT sensor data from 70+ different single board computers into balenaCloud. Discover how to use balena.io to automate your InfluxDB deployments at the edge!
During this one-hour session, experts from balena and InfluxData will demonstrate how to build and deploy your own air quality IoT solution. You will learn:
The fundamentals of IoT sensor deployment and management using balena.
How to use a time series platform to collect and visualize metrics from edge devices.
Tips and tricks to using balenaCloud to automate InfluxDB deployments and Telegraf configurations.
How to use InfluxDB's Edge Data Replication feature to collect sensor data and push it to InfluxDB Cloud for analysis.
No coding experience required, just a curiosity to start your own IoT adventure.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
RudderStack — the creators of the leading open source Customer Data Platform (CDP) — needed a scalable way to collect and store metrics related to customer events and processing times (down to the nanosecond). They provide their clients with data pipelines that simplify data collection from applications, websites, and SaaS platforms. RudderStack's solution enables clients to stream customer data in real time — they quickly deploy flexible data pipelines that send the data to the customer's entire stack without engineering headaches. Customers are able to stream data from any tool using their 16+ SDK's, and they are able to transform the data in-transit using JavaScript or Python. How does RudderStack use a time series platform to provide their customers with real-time analytics?
Join this webinar as Ryan McCrary dives into:
RudderStack's approach to streamlining data pipelines with their 180+ out-of-the-box integrations
Their data architecture including Kapacitor for alerting and Grafana for customized dashboards
Why using InfluxDB was crucial for them for fast data collection and providing single-sources of truths for their customers
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
Customers using ThingWorx and the Manufacturing Solutions often need to store property data longer than the Solutions default to. These customers are recommended to use InfluxDB, and this presentation will cover the key considerations for moving to InfluxDB vs the standard ThingWorx value streams. Join this session as Ward highlights ThingWorx’s solution and its easy implementation process.
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
Two new features are coming to Flux that add flexibility
and functionality to your data workflow—polymorphic
labels and dynamic types. This session walks through
these new features and shows how they work.
This document outlines the schedule for Day 2 of InfluxDays 2022, an event hosted by InfluxData. The schedule includes sessions on building developer experience, how developers like to work, an overview of the InfluxDB developer console and API, demos of client libraries and the InfluxDB v2 API, tips for getting involved in the InfluxDB community and university, use cases for networking monitoring, crypto/fintech, monitoring/observability, and IIoT, and closing thoughts. Recordings of all sessions will be made available to registered attendees by November 7th. Upcoming events include advanced Flux training in London and resources through the community forums, Slack channel, and online university.
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData
This document contains the agenda for Day 2 of InfluxDays 2022, which includes:
- Welcome and introductory remarks from Zoe Steinkamp and Jay Clifford of InfluxData.
- Fireside chats and presentations on building great developer experiences, how developers like to work, and use cases for InfluxDB from companies like Tesla, InfluxData, and others.
- Sessions on the InfluxDB developer console, APIs, client libraries, getting involved in the community, accelerating time to awesome with InfluxDB University, and tips for analyzing IoT data with InfluxDB.
- Closing thoughts from Zoe Steinkamp and Jay Clifford, as well as
The document summarizes the agenda and sessions for Day 1 of InfluxDays 2022. It includes sessions on InfluxDB data collection, scripting languages like Flux, the InfluxDB time series engine, tasks, storage, and a closing discussion. The agenda involves talks from InfluxData employees on building applications with real-time data, navigating the developer experience, solving problems, the InfluxDB platform, community, education, use cases in crypto/fintech and IIoT, and tips/tricks for analysis.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB Helps Vera C. Rubin Observatory Make the Deepest, Widest Image of the Universe | InfluxDays Virtual Experience NA 2020
1. Angelo Fausti & Frossie Economou
Vera C Rubin Observatory
How InfluxDB is helping us in
our quest to make the deepest,
widest image of the universe
5. Space is in a
state of flux
• Comets and asteroids
vary in position
• (Super)novae, variable
stars vary in brightness
• Galaxies vary in age
• Dark energy varies in,
uh, spacetime?
maybe?
Subaru HSC colour composite of COSMOS field, NAOJ
6. How to understand the
changing universe in 5
[not very] easy steps
xkcd
1522
9. Step 2:
Build a large
but nimble
telescope
Media: Rubin Observatory
<- 8.4 meter continuous
surface primary-tertiary mirror
10. Step 3:
Haul everything
up a mountain
Media: Rubin Observatory
Yes there’s Internet
No you can’t count on it
11.
12. Step 4:
Observe the Sky
Relentlessly
for 10 years;
Issue 10M Alerts
Every Night
Media: Rubin Observatory
• “All” sky 2x per week
• 60 seconds to produce
alerts
• 10-year images: 0.5 EB
• Final DB size: 15 PB
Legacy Survey of Space & Time (LSST)
observing cadence simulation
13. Step 5:
Get People
(also a data centre or three)
Write Software
Wait for 2022
Media: Rubin Observatory
And get yourself a data
centre or three…
All our own code is
💯% open source
github.com/lsst
github.com/lsst-sqre
14. photo: Wil O’Mullane
← ~ Oct 2019
We’ll hang
out on
#influxdays-
virtual
for more
Q&A
(@frossie
@afausti)
Over to
Angelo
15. How InfluxDB Helps Vera C. Rubin Observatory
Make the Deepest, Widest Image of the Universe
15
InfluxDays North America
November 2020
Frossie Economou
Technical Manager for Data Management,
Vera C. Rubin Observatory
Angelo Fausti
Software Engineer
Vera C. Rubin Observatory
24. Problems with our in-house solution
● A relational DB is not optimized for time series data
● Stuck with predefined dashboards and visualizations
● Limited exploratory analysis capabilities
● Our in-house development didn’t scale
● Use time more wisely: adopt an existing solution instead of
(re)inventing our own
24
25. Time (Years)
Adopting a TSDB, which one?
https://db-engines.com/en/ranking
25+
25
30+
log(Score)
26. “If it takes more than three days to get it
working it is not the right solution for you.”
Frossie Economou
26
27. Why InfluxDB?
● It is more than a TSDB, it is an innovative solution
● Open source software and community
● InfluxDB: efficient store for time series + InfluxQL and
Flux language
● Chronograf: postdefined visualizations
● Kapacitor: foster collaborative conversation (Slack)
27
28. InfluxDB schema design
FieldsTags
Results from the Data Release Production pipeline
● Measurement groups the results of the pipeline
● Timestamp is the time when the pipeline run finishes
● Tags are metadata associated to the pipeline run
● Fields are the metrics measured by the pipeline
Timestamp
28
29. First the Tags, then the Series
29
filter is the name of the optical filter used
at the telescope at a given time
drp,dataset=HSC,tract=509,filter=g {fields} timestamp
For each combination of tag values, there’s a new series.
A tract identifies a region in the
sky*
(*) https://pipelines.lsst.io/modules/lsst.skymap
30. Example of a Series
AM1: 6.42357
AM2: 6.48177
AM3: 4.62033
Time (run ID)
{field-set}i
Each point in a series contains the set of metrics measured by
the pipeline run and the results are grouped by the pipeline
name.
30
drp,dataset=HSC,tract=509,filter=g
49. 49
US Data Facility
Urbana, IL
Project staff access
RP 10yr
TestStand
Tucson, AZ
Summit
Cerro Pachon, Chile
Restricted access
RP ~30 days
TestStand
Chilean Data Facility
La Serena, Chile
<10MB/s
raw stream
A preview of
operations
51. Data Aggregation in Kafka with Faust
https://kafka-aggregator.lsst.io
51
Faust agents compute summary statistics on non-
overlapping windows of N seconds.
Data Reduction factor R~10
52. What’s next
52
● Migration to InfluxDB 2.0
○ Conversation with InfluxData design team about Annotations in 2.0
○ Flux training for the Observatory Staff
○ Flux Tasks for downsampling and trend analysis
● Rubin Observatory Interim Data Facility on Google Cloud
● Project transition from Construction to Operations is happening
○ New opportunities for using InfluxDB
● Self-monitoring
● Scalability as we load more data, RPs, etc.
53. Learn more…
53
● Vera C. Rubin Observatory
● Data Processing
● Verification Framework
● Engineering and Facilities Database
● Kafka Aggregator
● Rubin Science Platform
● Rubin Technical Documentation