This document discusses the components and architecture of InfluxDB IOx for replication, durability, and subscriptions. It describes the write buffer, how writes are routed and distributed across shards, replication between buffers to ensure durability, and how subscriptions are handled for querying data.
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...InfluxData
InfluxDB IOx Tech Talks
This talk presents a design of a distributed database system that splits data to gain query performance. The talk will define four main properties of data splitting: sharding, partitioning, sorting, and encoding; and then delve into examples to show their impacts on query performance.
Catalogs - Turning a Set of Parquet Files into a Data SetInfluxData
InfluxDB IOx Tech Talks
Placing a Parquet file into an object store serves as a simple data persistence format. However, storing data into multiple files enabling upserts, deletions, format upgrades, metadata management, and consistency checks at scale requires some form of a catalog that manages these files. In this talk we will explore the requirements for a catalog for InfluxDB IOx, prior art from the Parquet ecosystem, and the proposed solution.
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
Zstandard is a fast compression algorithm which you can use in Apache Spark in various way. In this talk, I briefly summarized the evolution history of Apache Spark in this area and four main use cases and the benefits and the next steps:
1) ZStandard can optimize Spark local disk IO by compressing shuffle files significantly. This is very useful in K8s environments. It’s beneficial not only when you use `emptyDir` with `memory` medium, but also it maximizes OS cache benefit when you use shared SSDs or container local storage. In Spark 3.2, SPARK-34390 takes advantage of ZStandard buffer pool feature and its performance gain is impressive, too.
2) Event log compression is another area to save your storage cost on the cloud storage like S3 and to improve the usability. SPARK-34503 officially switched the default event log compression codec from LZ4 to Zstandard.
3) Zstandard data file compression can give you more benefits when you use ORC/Parquet files as your input and output. Apache ORC 1.6 supports Zstandardalready and Apache Spark enables it via SPARK-33978. The upcoming Parquet 1.12 will support Zstandard compression.
4) Last, but not least, since Apache Spark 3.0, Zstandard is used to serialize/deserialize MapStatus data instead of Gzip.
There are more community works to utilize Zstandard to improve Spark. For example, Apache Avro community also supports Zstandard and SPARK-34479 aims to support Zstandard in Spark’s avro file format in Spark 3.2.0.
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
Query Processing in InfluxDB IOx
InfluxDB IOx Query Processing: In this talk we will provide an overview of Query Execution in IOx describing how once data is ingested that it is queryable, both via SQL and Flux and InfluxQL (via storage gRPC APIs).
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...InfluxData
InfluxDB IOx Tech Talks
This talk presents a design of a distributed database system that splits data to gain query performance. The talk will define four main properties of data splitting: sharding, partitioning, sorting, and encoding; and then delve into examples to show their impacts on query performance.
Catalogs - Turning a Set of Parquet Files into a Data SetInfluxData
InfluxDB IOx Tech Talks
Placing a Parquet file into an object store serves as a simple data persistence format. However, storing data into multiple files enabling upserts, deletions, format upgrades, metadata management, and consistency checks at scale requires some form of a catalog that manages these files. In this talk we will explore the requirements for a catalog for InfluxDB IOx, prior art from the Parquet ecosystem, and the proposed solution.
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
Zstandard is a fast compression algorithm which you can use in Apache Spark in various way. In this talk, I briefly summarized the evolution history of Apache Spark in this area and four main use cases and the benefits and the next steps:
1) ZStandard can optimize Spark local disk IO by compressing shuffle files significantly. This is very useful in K8s environments. It’s beneficial not only when you use `emptyDir` with `memory` medium, but also it maximizes OS cache benefit when you use shared SSDs or container local storage. In Spark 3.2, SPARK-34390 takes advantage of ZStandard buffer pool feature and its performance gain is impressive, too.
2) Event log compression is another area to save your storage cost on the cloud storage like S3 and to improve the usability. SPARK-34503 officially switched the default event log compression codec from LZ4 to Zstandard.
3) Zstandard data file compression can give you more benefits when you use ORC/Parquet files as your input and output. Apache ORC 1.6 supports Zstandardalready and Apache Spark enables it via SPARK-33978. The upcoming Parquet 1.12 will support Zstandard compression.
4) Last, but not least, since Apache Spark 3.0, Zstandard is used to serialize/deserialize MapStatus data instead of Gzip.
There are more community works to utilize Zstandard to improve Spark. For example, Apache Avro community also supports Zstandard and SPARK-34479 aims to support Zstandard in Spark’s avro file format in Spark 3.2.0.
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
Query Processing in InfluxDB IOx
InfluxDB IOx Query Processing: In this talk we will provide an overview of Query Execution in IOx describing how once data is ingested that it is queryable, both via SQL and Flux and InfluxQL (via storage gRPC APIs).
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Apache Arrow is designed to make things faster. Its focused on speeding communication between systems as well as processing within any one system. In this talk I'll start by discussing what Arrow is and why it was built. This will include covering an overview of the key components, goals, vision and current state. I’ll then take the audience through a detailed engineering review of how we used Arrow to solve several problems when building the Apache-Licensed Dremio product. This will include talking about Arrow performance characteristics, working with Arrow APIs, managing memory, sizing Arrow vectors, and moving data between processes and/or nodes. We’ll also review several code examples of specific data processing implementations and how they interact with Arrow data. Lastly we’ll spend a short amount of time on what’s next for Arrow. This will be a highly technical talk targeted towards people building data infrastructure systems and complex workflows.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
Parquet is a very popular column based format. Spark can automatically filter useless data using parquet file statistical data by pushdown filters, such as min-max statistics. On the other hand, Spark user can enable Spark parquet vectorized reader to read parquet files by batch. These features improve Spark performance greatly and save both CPU and IO. Parquet is the default data format of data warehouse in Bytedance. In practice, we find that parquet pushdown filters work poorly resulting in reading too much unnecessary data for statistical data has no discrimination across parquet row groups(column data is out of order when writing to parquet files by ETL jobs).
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
What if you could get the simplicity, convenience, interoperability, and storage niceties of an old-fashioned CSV with the speed of a NoSQL database and the storage requirements of a gzipped file? Enter Parquet.
At The Weather Company, Parquet files are a quietly awesome and deeply integral part of our Spark-driven analytics workflow. Using Spark + Parquet, we’ve built a blazing fast, storage-efficient, query-efficient data lake and a suite of tools to accompany it.
We will give a technical overview of how Parquet works and how recent improvements from Tungsten enable SparkSQL to take advantage of this design to provide fast queries by overcoming two major bottlenecks of distributed analytics: communication costs (IO bound) and data decoding (CPU bound).
Maximizing performance via tuning and optimizationMariaDB plc
Ensuring that your end users get the performance they expect from your system requires an organized approach to performance management. This session will cover the planning and measurement necessary to ensure satisfied customers, and will also include tips and tricks learned from MariaDB’s years of supporting many of the most demanding installations in the world.
In this presentation, I take a deep dive into the InfluxDB open source storage engine. More than just a single storage engine, InfluxDB is two engines in one: the first for time series data and the second, an index for metadata. I'll delve into the optimizations for achieving high write throughput, compression and fast reads for both the raw time series data and the metadata.
This ppt was used by Devrim at pgDay Asia 2017. He talked about some important facts about WAL - Transaction Logs or xlogs in PostgreSQL. Some of these can really come handy on a bad day
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
Flight SQL is a revolutionary new open database protocol designed for modern architectures. Key features in Flight SQL include a columnar-oriented design and native support for parallel processing of data partitions. This talk will go over how these new features can push SQL query throughput beyond existing standards such as ODBC.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
VictoriaLogs Preview - Aliaksandr Valialkin
* Existing open source log management systems
- ELK (ElasticSearch) stack: Pros & Cons
- Grafana Loki: Pros & Cons
* What is VictoriaLogs
- Open source log management system from VictoriaMetrics
- Easy to setup and operate
- Scales vertically and horizontally
- Optimized for low resource usage (CPU, RAM, disk space)
- Accepts data from Logstash and Fluentbit in Elasticsearch format
- Accepts data from Promtail in Loki format
- Supports stream concept from Loki
- Provides easy to use yet powerful query language - LogsQL
* LogsQL Examples
- Search by time
- Full-text search
- Combining search queries
- Searching arbitrary labels
* Log Streams
- What is a log stream?
- LogsQL examples: querying log streams
- Stream labels vs log labels
* LogsQL: stats over access logs
* VictoriaLogs: CLI Integration
* VictoriaLogs Recap
Welcome to New Swift: Library Evolution & LSP SupportG ABHISEK
SWIFT 5.0 has been around for some time. We have got new powers like ABI and Module Stability and Library Evolution support along with it making our lives simpler. Let us discuss these in depth about the possibilities that they bring along with them. We would also discuss the proposal that will make SWIFT touch new heights in the near future.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Apache Arrow is designed to make things faster. Its focused on speeding communication between systems as well as processing within any one system. In this talk I'll start by discussing what Arrow is and why it was built. This will include covering an overview of the key components, goals, vision and current state. I’ll then take the audience through a detailed engineering review of how we used Arrow to solve several problems when building the Apache-Licensed Dremio product. This will include talking about Arrow performance characteristics, working with Arrow APIs, managing memory, sizing Arrow vectors, and moving data between processes and/or nodes. We’ll also review several code examples of specific data processing implementations and how they interact with Arrow data. Lastly we’ll spend a short amount of time on what’s next for Arrow. This will be a highly technical talk targeted towards people building data infrastructure systems and complex workflows.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
Parquet is a very popular column based format. Spark can automatically filter useless data using parquet file statistical data by pushdown filters, such as min-max statistics. On the other hand, Spark user can enable Spark parquet vectorized reader to read parquet files by batch. These features improve Spark performance greatly and save both CPU and IO. Parquet is the default data format of data warehouse in Bytedance. In practice, we find that parquet pushdown filters work poorly resulting in reading too much unnecessary data for statistical data has no discrimination across parquet row groups(column data is out of order when writing to parquet files by ETL jobs).
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
What if you could get the simplicity, convenience, interoperability, and storage niceties of an old-fashioned CSV with the speed of a NoSQL database and the storage requirements of a gzipped file? Enter Parquet.
At The Weather Company, Parquet files are a quietly awesome and deeply integral part of our Spark-driven analytics workflow. Using Spark + Parquet, we’ve built a blazing fast, storage-efficient, query-efficient data lake and a suite of tools to accompany it.
We will give a technical overview of how Parquet works and how recent improvements from Tungsten enable SparkSQL to take advantage of this design to provide fast queries by overcoming two major bottlenecks of distributed analytics: communication costs (IO bound) and data decoding (CPU bound).
Maximizing performance via tuning and optimizationMariaDB plc
Ensuring that your end users get the performance they expect from your system requires an organized approach to performance management. This session will cover the planning and measurement necessary to ensure satisfied customers, and will also include tips and tricks learned from MariaDB’s years of supporting many of the most demanding installations in the world.
In this presentation, I take a deep dive into the InfluxDB open source storage engine. More than just a single storage engine, InfluxDB is two engines in one: the first for time series data and the second, an index for metadata. I'll delve into the optimizations for achieving high write throughput, compression and fast reads for both the raw time series data and the metadata.
This ppt was used by Devrim at pgDay Asia 2017. He talked about some important facts about WAL - Transaction Logs or xlogs in PostgreSQL. Some of these can really come handy on a bad day
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
Flight SQL is a revolutionary new open database protocol designed for modern architectures. Key features in Flight SQL include a columnar-oriented design and native support for parallel processing of data partitions. This talk will go over how these new features can push SQL query throughput beyond existing standards such as ODBC.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
VictoriaLogs Preview - Aliaksandr Valialkin
* Existing open source log management systems
- ELK (ElasticSearch) stack: Pros & Cons
- Grafana Loki: Pros & Cons
* What is VictoriaLogs
- Open source log management system from VictoriaMetrics
- Easy to setup and operate
- Scales vertically and horizontally
- Optimized for low resource usage (CPU, RAM, disk space)
- Accepts data from Logstash and Fluentbit in Elasticsearch format
- Accepts data from Promtail in Loki format
- Supports stream concept from Loki
- Provides easy to use yet powerful query language - LogsQL
* LogsQL Examples
- Search by time
- Full-text search
- Combining search queries
- Searching arbitrary labels
* Log Streams
- What is a log stream?
- LogsQL examples: querying log streams
- Stream labels vs log labels
* LogsQL: stats over access logs
* VictoriaLogs: CLI Integration
* VictoriaLogs Recap
Welcome to New Swift: Library Evolution & LSP SupportG ABHISEK
SWIFT 5.0 has been around for some time. We have got new powers like ABI and Module Stability and Library Evolution support along with it making our lives simpler. Let us discuss these in depth about the possibilities that they bring along with them. We would also discuss the proposal that will make SWIFT touch new heights in the near future.
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
TokuDB v7.5 introduced Read Free Replication, allowing MySQL slaves to run with virtually no read IO. This presentation discusses how Fractal Tree indexes work, what they enable in TokuDB, and they allow TokuDB to uniquely offer this replication innovation.
At Hootsuite, we've been transitioning from a single monolithic PHP application to a set of scalable Scala-based microservices. To avoid excessive coupling between services, we've implemented an event system using Apache Kafka that allows events to be reliably produced + consumed asynchronously from services as well as data stores.
In this presentation, I talk about:
- Why we chose Kafka
- How we set up our Kafka clusters to be scalable, highly available, and multi-data-center aware.
- How we produce + consume events
- How we ensure that events can be understood by all parts of our system (Some that are implemented in other programming languages like PHP and Python) and how we handle evolving event payload data.
Reduced instruction set computing, or RISC (pronounced 'risk', /ɹɪsk/), is a CPU design strategy based on the insight that a simplified instruction set provides higher performance when combined with a microprocessor architecture capable of executing those instructions using fewer microprocessor cycles per instruction.
by Yves Goeleven
Microsoft Azure Storage Services provides a scalable and reliable storage service in the cloud for blobs, tables and queues.
It’s the foundation that many other services build on top, like Virtual Machines and StorSimple.
In this session I will provide you with more insight in how this service works internally so that you can use these insights in your designs.
Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance.
Select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline are discussed. Lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes is discussed.
A technical overview on Kafka including key terminologies like topic, partition, offset, Kafka architecture (zookeeper, broker and replication). Also covers Kafka and IT team analogy. Read the article @ https://www.linkedin.com/pulse/kafka-technical-overview-sylvester-daniel/
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
Kafka operators need to provide guarantees to the business that Kafka is working properly and delivering data in real time, and they need to identify and triage problems so they can solve them before end users notice them. This elevates the importance of Kafka monitoring from a nice-to-have to an operational necessity. In this talk, Kafka operations experts Xavier Léauté and Gwen Shapira share their best practices for monitoring Kafka and the streams of events flowing through it. How to detect duplicates, catch buggy clients, and triage performance issues – in short, how to keep the business’s central nervous system healthy and humming along, all like a Kafka pro.
Speakers: Gwen Shapira, Xavier Leaute (Confluence)
Gwen is a software engineer at Confluent working on core Apache Kafka. She has 15 years of experience working with code and customers to build scalable data architectures. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects.
Xavier Leaute is One of the first engineers to Confluent team, Xavier is responsible for analytics infrastructure, including real-time analytics in KafkaStreams. He was previously a quantitative researcher at BlackRock. Prior to that, he held various research and analytics roles at Barclays Global Investors and MSCI.
Similar to InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxDB IOx (20)
InfluxData is excited to announce InfluxDB Clustered, the self-managed version of InfluxDB 3.0 with unparalleled flexibility, speed, performance, and scale. The evolution of InfluxDB Enterprise, InfluxDB Clustered is delivered as a collection of Kubernetes-based containers and services, which enables you to run and operate InfluxDB 3.0 where you need it, whether that's on-premises or in a private cloud environment. With this new enterprise offering, we’re excited to provide our customers with real-time queries, low-cost object storage, unlimited cardinality, and SQL language support – all with improved data access, support, and security! The newest version of InfluxDB was built on Apache Arrow, and through the open source ecosystem and integrations, extends the value of your time-stamped data.
Join this webinar to learn more about InfluxDB Clustered, and how to manage your large mission-critical workloads in the highly available database service offering!
In this webinar, Balaji Palani and Gunnar Aasen will dive into:
Key features of the new InfluxDB Clustered solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
Apache Arrow is an open source project intended to provide a standardized columnar memory format for flat and hierarchical data. It enables more efficient analytics workloads for modern CPU and GPU hardware, which makes working with large data sets easier and cheaper.
InfluxData and Dremio are both members of the Apache Software Foundation (ASF). Dremio is a data lakehouse management service known for its scalability and capacity for direct querying across diverse data sources. InfluxDB is the purpose-built time series database, and InfluxDB 3.0 has a new columnar storage engine and uses the Arrow format for representing data and moving data to and from Parquet. Discover how InfluxDB and Dremio have advanced their solutions by relying on the Apache Arrow framework.
Join this live panel as Alex Merced and Anais Dotis-Georgiou dive into:
Advantages to utilizing the Apache Arrow ecosystem
Tips and tricks for implementing the columnar data structure
How developers can best utilize the ASF to innovate and contribute to new industry standards
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
Bevi are the creators of smart water dispensers which empower people to choose their desired beverage — flat or sparkling, their desired flavor and temperature. Since 2014, Bevi users have saved more than 350 million bottles and cans. Their "smart" water coolers have prevented the extraction of 1.4 trillion oz of oil from Earth and have saved 21.7 billion grams of CO2 from the atmosphere.
Discover how Bevi uses a time series database to enable better predictive maintenance and alerting of their entire ecosystem — including the hardware and software. They are using InfluxDB to collect sensor data in real-time remotely from their internet-connected machines about their status and activity — i.e., flavor and CO2 levels, water temp, filter status, etc. They a7re using these metrics to improve their customer experience and continuously improve their sustainability practices. Gain tips and tricks on how to best utilize InfluxDB's schema-less design.
Join this webinar as Spencer Gagnon dives into:
Bevi's approach to reducing organizations' carbon footprint — they are saving 50K+ bottles and cans annually
Their entire system architecture — including InfluxDB Cloud, Grafana, Kafka, and DigitalOcean
The importance of using time-stamped data to extend the life of their machines
Power Your Predictive Analytics with InfluxDBInfluxData
If you're using InfluxDB to store and manage your time series data, you're already off to a great start. But why stop there? In our upcoming webinar, we'll show you how to take your data analysis to the next level by building predictive analytics using a variety of tools and techniques.
We will demonstrate how to use Quix to create custom dashboards and visualizations that allow you to monitor your data in real-time. We'll also introduce you to Hugging Face, a powerful tool for building models that can predict future trends and identify anomalies. With these tools at your disposal, you'll be able to extract valuable insights from your data and make more informed decisions about the future. Don't miss out on this opportunity to improve your data analysis skills and take your business to the next level!
What you will learn:
Use InfluxDB to store and manage time series data
Utilize Quix and Hugging Face to build models, visualize trends, and identify anomalies
Extract valuable insights from your data
Improve your data analysis skills to make informed decision
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
Are you considering replacing your legacy data historian and moving your OT data to the cloud? Join this technical webinar to learn how to adopt InfluxDB and IO Base - a digital platform used to improve operational efficiencies!
Teréga Solutions are the creators of digital solutions used to improve energy efficiencies and to address decarbonization challenges. Their network includes 5,000+ km of gas pipelines within France; they aim to help France attain carbon neutrality by 2050. With these impressive goals in mind, Teréga has created IO-Base — the digital platform to improve industrial performance, and increase profitability. Creating digital twins for their clients allows them to collect data from all production sites and view it in real time, from anywhere and at any time.
Discover how Teréga uses InfluxDB, Docker, and AWS to monitor its gas and hydrogen pipeline infrastructure. They chose to replace their legacy data historian with InfluxDB — the purpose built time series database. They are collecting more than 100K different metrics at various frequencies — some are collected every 5 seconds to only every 1-2 minutes. THey have reduced overall IT spend by 50% and collect 2x the amount of data at 20x frequency! By using various industrial protocols (Modbus, OPC-UA, etc.), Teréga improved output, reduced the TCO, and is now able to create added-value services: forecast, monitoring, predictive maintenance.
Join this webinar as Thomas Delquié dives into:
Teréga's approach to modernizing fossil fuel pipelines IT systems while improving yields and safety
Their centralized methodology to collecting sensor, hardware, and network metrics
The importance of time series data and why they chose InfluxDB
Build an Edge-to-Cloud Solution with the MING StackInfluxData
FlowForge enables organizations to reliably deliver Node-RED applications in a continuous, collaborative, and secure manner. Node-RED is the popular, low-code programming solution that makes it easy to connect different services using a visual programming environment. InfluxData is the creator of InfluxDB, the purpose-built time series database run by developers at scale and in any environment in the cloud, on-premises, or at the edge.
Jump-start monitoring your industrial IoT devices and discover how to build an edge-to-cloud solution with the MING stack. The MING stack includes Mosquitto/MQTT, InfluxDB, Node-RED, and Grafana. This solution can be used to improve fleet management, enable predictive maintenance of industrial machines and power generation equipment (i.e. turbines and generators) and increase safety practices (i.e. buildings, construction sites). Join this webinar to learn best practices from industrial IoT SME's.
In this webinar, Robert Marcer and Jay Clifford dive into:
Best practices for monitoring sensor data collected by everyone — from the edge to the factory
Tips and tricks for using Node-RED and InfluxDB together
Demo — see Node-RED and InfluxDB live
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
Rust is a systems programming language designed for high performance, type safety, and concurrency. According to Stack Overflow’s annual survey in 2022, Rust is the most loved language with 87% of developers saying they want to continue using it. The same survey also reported that nearly 20% of developers aren’t currently using Rust, but want to start developing using it.
Ockam’s suite of programming libraries, command line tools, and managed cloud services enable developers to orchestrate end-to-end encryption. InfluxDB is the purpose-built time series database developed to handle time series data for IoT, monitoring, and real-time analytics. Ockam was originally developed using C, and InfluxDB was originally written using Go; both solutions have been completely rewritten in Rust. Discover why two founders decided to rewrite their developer tools using Rust, and gain insight into the strategy beforehand and the entire process.
Join this live panel as Mrinal Wadhwa and Paul Dix dive into:
Their approach to rewriting a project in Rust
How to build and train engineering teams
Tips and tricks learned along the way - pitfalls to look out for!
Join this webinar as there will be a live discussion with Q&A
InfluxData is excited to announce the general availability of InfluxDB Cloud Dedicated! It is a fully managed time series database service running on cloud infrastructure resources that are dedicated to a single tenant. With this new offering, we’re excited to provide our customers with additional security options, and more custom configuration options to best suit customers’ workload requirements. Join this webinar to learn more about InfluxDB Cloud, and the new dedicated database service offering!
In this webinar, Balaji Palani and Gary Fowler will dive into:
Key features of the new InfluxDB Cloud Dedicated solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
Many developers and DevOps engineers have become aware of using their observability data to gain greater insights into their infrastructure systems. InfluxDB is the purpose-built time series database used to collect metrics and gain observability into apps, servers, containers, and networks. Developers use InfluxDB to improve the quality and efficiency of their CI/CD pipelines. Start using InfluxDB to aggregate infrastructure and application performance monitoring metrics to enable better anomaly detection, root-cause analysis, and alerting.
This session will demonstrate how to record metrics, logs, and traces with one library — OpenTelemetry — and store them in one open source time series database — InfluxDB. Zoe will demonstrate how easy it is to set up the OpenTelemetry Operator for Kubernetes and to store and analyze your data in InfluxDB.
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
American Metal Processing Company ("AMP") is the US' largest commercial rotary heat treat facility with customers in the automotive, construction, military, and agriculture industries. They use their atmosphere-protected rotary retort furnaces to provide their clients with three primary hardening services: neutral hardening (quench and temper), carburizing, and carbonitriding.
This furnace style ensures consistent, uniform heat treatment process vs. traditional batch-or-belt-style furnaces; excels at processing high volumes of smaller parts with tight tolerances; and improves the strength and toughness of plain carbon steels. Discover why AMP’s use of Telegraf, InfluxDB, Node-RED, and Grafana allows them to gain 24/7 insights into their plant operations and metallurgical results. Learn how they use time-stamped data to gain accurate metrics about their consumables usage, furnace profiles, and machine status.
Join this webinar as Grant Pinkos dives into:
American Metal Processing's approach to heat treating in a digitized environment through connected systems
Their approach to collecting and measuring sensor data to enable predictive maintenance and improve product quality
Why they need a time series database for managing and analyzing vast amounts of time-stamped data
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
Delft University is the oldest and largest technical university in the Netherlands with 25,000+ students. Since 1999, they have had a team of students (undergraduate and graduate) designing, building, and racing cars, as part of the Formula Student worldwide competition. The competition has grown to include teams from 1K+ universities in 20+ countries. Students are responsible for all aspects of car manufacturing (research, construction, testing, developing, marketing, management, and fundraising). Delft University's team includes 90 students across disciplines.
Discover how Delft University's team uses Marple and InfluxDB to collect telemetry and sensor metrics while they develop, test, and race their electrics cars. They collect sensor data about their EV's control systems using a time series platform. During races, they are collecting IoT data about their batteries, accelerometer, gyroscope, tires, etc. The engineers are able to share important car stats during races which help the drivers tweak their driving decisions — all with the goal of winning. After races, the entire team are able to analyze data in Marple to understand what to do better next time. By using Marple + InfluxDB, their team are able to collect, share and analyze high frequency car data used to make their car faster at competitions.
Join this webinar as Robbin Baauw and Nero Vanbiervliet dive into:
Marple's approach to empowering engineers to organize, analyze, and visualize their data
Delft University's collaborative methodology to building and racing their Formula-style race car
How InfluxDB is crucial to their collaborative engineering and racing process
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
InfluxData is excited to announce the general availability of InfluxDB Cloud's new storage engine! It is a cloud-native, real-time, columnar database optimized for time series data. InfluxDB's rebuilt core was coded in Rust and sits on top of Apache Arrow and DataFusion. InfluxData's team picked Apache Parquet as the persistent format. In this webinar, Paul Dix and Balaji Palani will demonstrate key product features including the removal of cardinality limits!
They will dive into:
The next phase of the InfluxDB platform
How using Apache Arrow's ecosystem has improved InfluxDB's performance and scalability
Key features of InfluxDB Cloud's new core — including SQL native support
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
balena.io helps companies develop, deploy, update, and manage IoT devices. By using Linux containers and other cloud technologies, balena enables teams to quickly and easily build fleets of connected devices. Developers are able to use containers with the language of choice and pull IoT sensor data from 70+ different single board computers into balenaCloud. Discover how to use balena.io to automate your InfluxDB deployments at the edge!
During this one-hour session, experts from balena and InfluxData will demonstrate how to build and deploy your own air quality IoT solution. You will learn:
The fundamentals of IoT sensor deployment and management using balena.
How to use a time series platform to collect and visualize metrics from edge devices.
Tips and tricks to using balenaCloud to automate InfluxDB deployments and Telegraf configurations.
How to use InfluxDB's Edge Data Replication feature to collect sensor data and push it to InfluxDB Cloud for analysis.
No coding experience required, just a curiosity to start your own IoT adventure.
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
RudderStack — the creators of the leading open source Customer Data Platform (CDP) — needed a scalable way to collect and store metrics related to customer events and processing times (down to the nanosecond). They provide their clients with data pipelines that simplify data collection from applications, websites, and SaaS platforms. RudderStack's solution enables clients to stream customer data in real time — they quickly deploy flexible data pipelines that send the data to the customer's entire stack without engineering headaches. Customers are able to stream data from any tool using their 16+ SDK's, and they are able to transform the data in-transit using JavaScript or Python. How does RudderStack use a time series platform to provide their customers with real-time analytics?
Join this webinar as Ryan McCrary dives into:
RudderStack's approach to streamlining data pipelines with their 180+ out-of-the-box integrations
Their data architecture including Kapacitor for alerting and Grafana for customized dashboards
Why using InfluxDB was crucial for them for fast data collection and providing single-sources of truths for their customers
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
Customers using ThingWorx and the Manufacturing Solutions often need to store property data longer than the Solutions default to. These customers are recommended to use InfluxDB, and this presentation will cover the key considerations for moving to InfluxDB vs the standard ThingWorx value streams. Join this session as Ward highlights ThingWorx’s solution and its easy implementation process.
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
Two new features are coming to Flux that add flexibility
and functionality to your data workflow—polymorphic
labels and dynamic types. This session walks through
these new features and shows how they work.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
8. The Write Buffer
• Buffer assigns
clock value &
writer id
(unique)
• Makes request
to peers
• Note the
request from
Router to Buffer
still open
9. The Write Buffer
• Buffer 2 takes
the write,
updates its
clock to max
• Buffer 2
responds with
its clock value
• Buffer 1
updates its
clock to the
max of the two
10. The Write Buffer
• Buffer 1 sends
success
response to
routing layer
• Routing layer
responds to the
client (after all
shards succeed)