Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
In this session you’ll get detailed overview of Kapacitor, InfluxDB’s native data processing engine. The session will cover how to install, configure and build custom TICKscripts enable alerting and anomaly detection.
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
"During development and automated tests, it is common to create Kafka clusters from scratch and run workloads against those short-lived clusters. Starting a Kafka broker typically takes several seconds, and those seconds add up to precious time and resources.
How about spinning up a Kafka broker in less than 0.2 seconds with less memory overhead? In this session, we will talk about kafka-native, which leverages GraalVM native image for compiling Kafka broker to native executable using Quarkus framework. After going through some implementation details, we will focus on how it can be used in a Docker container with Testcontainers to speed up integration testing of Kafka applications. We will finally discuss some current caveats and future opportunities of a native-compiled Kafka for cloud-native production clusters."
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Speaker: Jun Rao, VP of Apache Kafka and Co-founder of Confluent
The controller is the brain of Apache Kafka®. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
In this talk, Jun will outline the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker. Jun will then describe recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Jun Rao is the co-founder of Confluent, a company that provides a streaming data platform on top of Apache Kafka. Previously, Jun was a senior staff engineer at LinkedIn, where he led the development of Kafka, and a researcher at IBM's Almaden research datacenter, where he conducted research on database and distributed systems. Jun is the PMC chair of Apache Kafka and a committer of Cassandra. He writes at https://cnfl.io/blog-jun-rao.
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
Why Architecting for Disaster Recovery is Important for Your Time Series Data...InfluxData
Time Series data at Capital One consists of Infrastructure, Application, and Business Process Metrics. The combination of these metrics are what the internal stakeholders rely on for observability which allows them to deliver better service and uptime for their customers, so protecting this critical data with a proven and tested recovery plan is not a “nice to have” but a “must have.”
In this talk, the members of IT staff, Saravanan Krisharaju, Rajeev Tomer, and Karl Daman will share how they built a fault-tolerant solution based on InfluxEnterprise and AWS that collects and stores metrics and events. They added to this, Machine Learning, which uses the collected time series to model predictions which are then brought back into InfluxDB time series database for real-time access. This Capital One team shares the journey they took to architect and build this solution as well as plan and execute on their disaster recovery plan.
Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
In this session you’ll get detailed overview of Kapacitor, InfluxDB’s native data processing engine. The session will cover how to install, configure and build custom TICKscripts enable alerting and anomaly detection.
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
"During development and automated tests, it is common to create Kafka clusters from scratch and run workloads against those short-lived clusters. Starting a Kafka broker typically takes several seconds, and those seconds add up to precious time and resources.
How about spinning up a Kafka broker in less than 0.2 seconds with less memory overhead? In this session, we will talk about kafka-native, which leverages GraalVM native image for compiling Kafka broker to native executable using Quarkus framework. After going through some implementation details, we will focus on how it can be used in a Docker container with Testcontainers to speed up integration testing of Kafka applications. We will finally discuss some current caveats and future opportunities of a native-compiled Kafka for cloud-native production clusters."
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Speaker: Jun Rao, VP of Apache Kafka and Co-founder of Confluent
The controller is the brain of Apache Kafka®. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
In this talk, Jun will outline the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker. Jun will then describe recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Jun Rao is the co-founder of Confluent, a company that provides a streaming data platform on top of Apache Kafka. Previously, Jun was a senior staff engineer at LinkedIn, where he led the development of Kafka, and a researcher at IBM's Almaden research datacenter, where he conducted research on database and distributed systems. Jun is the PMC chair of Apache Kafka and a committer of Cassandra. He writes at https://cnfl.io/blog-jun-rao.
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
Why Architecting for Disaster Recovery is Important for Your Time Series Data...InfluxData
Time Series data at Capital One consists of Infrastructure, Application, and Business Process Metrics. The combination of these metrics are what the internal stakeholders rely on for observability which allows them to deliver better service and uptime for their customers, so protecting this critical data with a proven and tested recovery plan is not a “nice to have” but a “must have.”
In this talk, the members of IT staff, Saravanan Krisharaju, Rajeev Tomer, and Karl Daman will share how they built a fault-tolerant solution based on InfluxEnterprise and AWS that collects and stores metrics and events. They added to this, Machine Learning, which uses the collected time series to model predictions which are then brought back into InfluxDB time series database for real-time access. This Capital One team shares the journey they took to architect and build this solution as well as plan and execute on their disaster recovery plan.
Work With Federal Agencies? Here's What You Should Know About FedRAMP Assessm...Schellman & Company
FedRAMP is the federal government's risk and security assessment program for cloud-based services as part of the cloud-first initiative, and is designed to make the assessment process more efficient by providing a "do once, use many times" framework.
If you work with or want to work with federal agencies, your organization will need to be FedRAMP compliant.
On this webinar, you will:
• Learn the background and overview of the FedRAMP program
• Take a deep dive of the assessment process
• Discover the benefits and challenges companies experience during the assessment process
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...InfluxData
Robinhood is democratizing the financial systems by offering commission-free investing and trading with the use of your phone or desktop. As exciting as that sounds to the outside world, internally, the team at Robinhood must understand the different risk vectors and build engineering solutions to mitigate these risks. In this talk, Allison will talk about how they build a real-time risk monitoring system with InfluxDB and Faust, an open-source Python stream processing library. She will review the architecture behind the system which will involve both the time series anomaly detection part (InfluxDB) and the real-time stream processing part (Faust/Kafka).
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. The focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation into Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
When it comes to designing, building, and operating mission-critical data centers, simple is better. Data centers are faster and cheaper to build, and more reliable with lower total cost of ownership (TCO), when they start with a dramatically simplified design and build process that incorporates these elements: reference designs, pre-fabricated/modular architecture, partners that bring comprehensive capabilities into play, and coordinated planning around software, operations, and service. What you will learn: The top 5 weaknesses of today’s design/bid/build approach A simplified approach using reference designs and pre-fab products can preserve Day 1 capital and improve speed to market An integrated solution provider (Design, Build, Hardware and Operations) can optimize CapEx, OpEx and TCO.
There are many factors in the data center that are driving the new data center design considerations. This slideshare discusses several of the trends in the data center and covers several solutions to implement.
Getting Started: Intro to Telegraf - July 2021InfluxData
In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A.
Join this training as Samantha Wang dives into:
Types of Telegraf plugins (i.e. input, output, aggregator and processor)
Specific plugins including Execd input plugins and the Starlark processor plugin
How to install and start using Telegraf
From a Time-Series Database to a Key Operational Technology for the EnterpriseInfluxData
It is important for a time-series database to provide efficient storage, aggregation, and query of time-series data. Elevating a time-series database to a core operational technology of the enterprise, however, involves so much more. In this talk, I will explore important considerations that take a time-series database from being just another data-store, to a critical infrastructure for operational intelligence. I will explore challenges related to time, change management, incorporating asset models from the business domain, and integration with other operational technologies, like streaming-data platforms.
Performance Engineering Masterclass: Efficient Automation with the Help of SR...ScyllaDB
Henrik Rexed from Dynatrace walks through how to measure, validate and visualize these SLOs using Prometheus, an open observability platform, to provide concrete examples. Next, you learn how to automate your deployment using Keptn, a cloud-native event-based life-cycle orchestration framework. Discover how it can be used for multi-stage delivery, remediation scenarios, and automating production tasks.
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/best-practices-for-streaming-iot-data-with-MQTT-and-apache-kafka-on-demand
Organizations today are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices over unreliable networks can create some architecture challenges.
In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
This face to face talk about Apache Flink in Sao Paulo, Brazil is the first event of its kind in Latin America! It explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation (link), marks a new era of Big Data analytics and in particular Real-Time streaming analytics. The talk maps Flink's capabilities to real-world use cases that span multiples verticals such as: Financial Services, Healthcare, Advertisement, Oil and Gas, Retail and Telecommunications.
In this talk, you learn more about:
1. What is Apache Flink Stack?
2. Batch vs. Streaming Analytics
3. Key Differentiators of Apache Flink for Streaming Analytics
4. Real-World Use Cases with Flink for Streaming Analytics
5. Who is using Flink?
6. Where do you go from here?
Power of Splunk Search Processing Language (SPL)Splunk
This session will unveil the power of the Splunk Search Processing Language (SPL). See how to use Splunk's simple search language for searching and filtering through data, charting statistics and predicting values, converging data sources and grouping transactions, and finally data science and exploration. We'll begin with basic search commands and build up to more powerful advanced tactics to help you harness your Splunk Fu!
Collaborating with Direct Spend Suppliers in the Life Sciences Industry - 56556SAP Ariba Live 2018
Join this session to learn how a medical devices OEM collaborates with suppliers and contract manufacturing partners on key supply chain processes, including planning and product quality. Learn how to improve productivity and visibility. Hear about the collaboration and onboarding journey, wins, and lessons learned.
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...DevOps.com
Discover how Texas Instruments uses a time series database to gain better insights into their industrial operations. By collecting high level metrics and events, they are continuously improving productivity and are becoming data-driven. Learn how InfluxDB can result in cost savings and have a direct impact on performance by analyzing seasonality, trends and behaviors.
In this webinar, Michael Hinkle will cover:
Texas Instruments’ ability to innovate while creating electronic systems for multiple industries
How an organization’s machines, tools and processes can impact proficiency
InfluxDB’s impact on their production and quality assurance
Work With Federal Agencies? Here's What You Should Know About FedRAMP Assessm...Schellman & Company
FedRAMP is the federal government's risk and security assessment program for cloud-based services as part of the cloud-first initiative, and is designed to make the assessment process more efficient by providing a "do once, use many times" framework.
If you work with or want to work with federal agencies, your organization will need to be FedRAMP compliant.
On this webinar, you will:
• Learn the background and overview of the FedRAMP program
• Take a deep dive of the assessment process
• Discover the benefits and challenges companies experience during the assessment process
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
How Robinhood Built a Real-Time Anomaly Detection System to Monitor and Mitig...InfluxData
Robinhood is democratizing the financial systems by offering commission-free investing and trading with the use of your phone or desktop. As exciting as that sounds to the outside world, internally, the team at Robinhood must understand the different risk vectors and build engineering solutions to mitigate these risks. In this talk, Allison will talk about how they build a real-time risk monitoring system with InfluxDB and Faust, an open-source Python stream processing library. She will review the architecture behind the system which will involve both the time series anomaly detection part (InfluxDB) and the real-time stream processing part (Faust/Kafka).
This presentation will start by introducing how Apache Lucene can be used to classify documents using data structures that already exist in your index instead of having to generate and supply external training sets. The focus will be on extensions of the Lucene Classification module that come in Lucene 6.0 and the Lucene Classification module's incorporation into Solr 6.1. These extensions will allow you to classify at a document level with individual field weighting, numeric field support, lat/lon fields etc. The Solr ClassificationUpdateProcessor will be explored and how to use it including basic and advanced features like multi class support and classification context filtering. The presentation will include practical examples and real world use cases.
When it comes to designing, building, and operating mission-critical data centers, simple is better. Data centers are faster and cheaper to build, and more reliable with lower total cost of ownership (TCO), when they start with a dramatically simplified design and build process that incorporates these elements: reference designs, pre-fabricated/modular architecture, partners that bring comprehensive capabilities into play, and coordinated planning around software, operations, and service. What you will learn: The top 5 weaknesses of today’s design/bid/build approach A simplified approach using reference designs and pre-fab products can preserve Day 1 capital and improve speed to market An integrated solution provider (Design, Build, Hardware and Operations) can optimize CapEx, OpEx and TCO.
There are many factors in the data center that are driving the new data center design considerations. This slideshare discusses several of the trends in the data center and covers several solutions to implement.
Getting Started: Intro to Telegraf - July 2021InfluxData
In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A.
Join this training as Samantha Wang dives into:
Types of Telegraf plugins (i.e. input, output, aggregator and processor)
Specific plugins including Execd input plugins and the Starlark processor plugin
How to install and start using Telegraf
From a Time-Series Database to a Key Operational Technology for the EnterpriseInfluxData
It is important for a time-series database to provide efficient storage, aggregation, and query of time-series data. Elevating a time-series database to a core operational technology of the enterprise, however, involves so much more. In this talk, I will explore important considerations that take a time-series database from being just another data-store, to a critical infrastructure for operational intelligence. I will explore challenges related to time, change management, incorporating asset models from the business domain, and integration with other operational technologies, like streaming-data platforms.
Performance Engineering Masterclass: Efficient Automation with the Help of SR...ScyllaDB
Henrik Rexed from Dynatrace walks through how to measure, validate and visualize these SLOs using Prometheus, an open observability platform, to provide concrete examples. Next, you learn how to automate your deployment using Keptn, a cloud-native event-based life-cycle orchestration framework. Discover how it can be used for multi-stage delivery, remediation scenarios, and automating production tasks.
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/best-practices-for-streaming-iot-data-with-MQTT-and-apache-kafka-on-demand
Organizations today are looking to stream IoT data to Apache Kafka. However, connecting tens of thousands or even millions of devices over unreliable networks can create some architecture challenges.
In this session, we will identify and demo some best practices for implementing a large scale IoT system that can stream MQTT messages to Apache Kafka.
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
This face to face talk about Apache Flink in Sao Paulo, Brazil is the first event of its kind in Latin America! It explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation (link), marks a new era of Big Data analytics and in particular Real-Time streaming analytics. The talk maps Flink's capabilities to real-world use cases that span multiples verticals such as: Financial Services, Healthcare, Advertisement, Oil and Gas, Retail and Telecommunications.
In this talk, you learn more about:
1. What is Apache Flink Stack?
2. Batch vs. Streaming Analytics
3. Key Differentiators of Apache Flink for Streaming Analytics
4. Real-World Use Cases with Flink for Streaming Analytics
5. Who is using Flink?
6. Where do you go from here?
Power of Splunk Search Processing Language (SPL)Splunk
This session will unveil the power of the Splunk Search Processing Language (SPL). See how to use Splunk's simple search language for searching and filtering through data, charting statistics and predicting values, converging data sources and grouping transactions, and finally data science and exploration. We'll begin with basic search commands and build up to more powerful advanced tactics to help you harness your Splunk Fu!
Collaborating with Direct Spend Suppliers in the Life Sciences Industry - 56556SAP Ariba Live 2018
Join this session to learn how a medical devices OEM collaborates with suppliers and contract manufacturing partners on key supply chain processes, including planning and product quality. Learn how to improve productivity and visibility. Hear about the collaboration and onboarding journey, wins, and lessons learned.
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...DevOps.com
Discover how Texas Instruments uses a time series database to gain better insights into their industrial operations. By collecting high level metrics and events, they are continuously improving productivity and are becoming data-driven. Learn how InfluxDB can result in cost savings and have a direct impact on performance by analyzing seasonality, trends and behaviors.
In this webinar, Michael Hinkle will cover:
Texas Instruments’ ability to innovate while creating electronic systems for multiple industries
How an organization’s machines, tools and processes can impact proficiency
InfluxDB’s impact on their production and quality assurance
While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
Big data tools are challenging to combine into a larger application: ironically, big data applications themselves do not tend to scale very well. These issues of integration and data management are only magnified by increasingly large volumes of data. Apache Spark provides strong building blocks for batch processes, streams and ad-hoc interactive analysis. However, users face challenges when putting together a single coherent pipeline that could involve hundreds of transformation steps, especially when confronted by the need of rapid iterations. This talk explores these issues through the lens of functional programming. It presents an experimental framework that provides full-pipeline guarantees by introducing more laziness to Apache Spark. This framework allows transformations to be seamlessly composed and alleviates common issues, thanks to whole program checks, auto-caching, and aggressive computation parallelization and reuse.
From: DataWorks Summit Munich 2017 - 20170406
While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.
Apache Hadoop project, and the Hadoop ecosystem has been designed be extremely flexible, and extensible. HDFS, Yarn, and MapReduce combined have more that 1000 configuration parameters that allow users to tune performance of Hadoop applications, and more importantly, extend Hadoop with application-specific functionality, without having to modify any of the core Hadoop code.
In this talk, I will start with simple extensions, such as writing a new InputFormat to efficiently process video files. I will provide with some extensions that boost application performance, such as optimized compression codecs, and pluggable shuffle implementations. With refactoring of MapReduce framework, and emergence of YARN, as a generic resource manager for Hadoop, one can extend Hadoop further by implementing new computation paradigms.
I will discuss one such computation framework, that allows Message Passing applications to run in the Hadoop cluster alongside MapReduce. I will conclude by outlining some of our ongoing work, that extends HDFS, by removing namespace limitations of the current Namenode implementation.
From Pipelines to Refineries: Scaling Big Data ApplicationsDatabricks
Big data tools are challenging to combine into a larger application: ironically, big data applications themselves do not tend to scale very well. These issues of integration and data management are only magnified by increasingly large volumes of data.
Apache Spark provides strong building blocks for batch processes, streams and ad-hoc interactive analysis. However, users face challenges when putting together a single coherent pipeline that could involve hundreds of transformation steps, especially when confronted by the need of rapid iterations.
This talk explores these issues through the lens of functional programming. It presents an experimental framework that provides full-pipeline guarantees by introducing more laziness to Apache Spark. This framework allows transformations to be seamlessly composed and alleviates common issues, thanks to whole program checks, auto-caching, and aggressive computation parallelization and reuse.
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
The Hadoop ecosystem has improved real-time access capabilities recently, narrowing the gap with relational database technologies. However, gaps remain in the storage layer that complicate the transition to Hadoop-based architectures. In this session, the presenter will describe these gaps and discuss the tradeoffs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. The session also will cover Kudu (currently in beta), the new addition to the open source Hadoop ecosystem with outof-the-box integration with Apache Spark and Apache Impala (incubating), that achieves fast scans and fast random access from a single API.
Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle
Slides from the STL Big Data IDEA meeting from January 2019. The presenters discussed technologies to continue using, stop using, and start using in 2019.
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
In this InfluxDays NYC 2019 presentation, InfluxData VP of Products Tim Hall and Sales Engineer Sam Dillard discuss architecture patterns with InfluxEnterprise time series platform. They cover an overview of InfluxEnterprise, features, ingestion and query rates, deployment examples, replication patterns, and general advice. Presentation highlights include InfluxEnterprise cluster architecture and how to determine if you're ready for adopting InfluxEnterprise.
InfluxData is excited to announce InfluxDB Clustered, the self-managed version of InfluxDB 3.0 with unparalleled flexibility, speed, performance, and scale. The evolution of InfluxDB Enterprise, InfluxDB Clustered is delivered as a collection of Kubernetes-based containers and services, which enables you to run and operate InfluxDB 3.0 where you need it, whether that's on-premises or in a private cloud environment. With this new enterprise offering, we’re excited to provide our customers with real-time queries, low-cost object storage, unlimited cardinality, and SQL language support – all with improved data access, support, and security! The newest version of InfluxDB was built on Apache Arrow, and through the open source ecosystem and integrations, extends the value of your time-stamped data.
Join this webinar to learn more about InfluxDB Clustered, and how to manage your large mission-critical workloads in the highly available database service offering!
In this webinar, Balaji Palani and Gunnar Aasen will dive into:
Key features of the new InfluxDB Clustered solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
Apache Arrow is an open source project intended to provide a standardized columnar memory format for flat and hierarchical data. It enables more efficient analytics workloads for modern CPU and GPU hardware, which makes working with large data sets easier and cheaper.
InfluxData and Dremio are both members of the Apache Software Foundation (ASF). Dremio is a data lakehouse management service known for its scalability and capacity for direct querying across diverse data sources. InfluxDB is the purpose-built time series database, and InfluxDB 3.0 has a new columnar storage engine and uses the Arrow format for representing data and moving data to and from Parquet. Discover how InfluxDB and Dremio have advanced their solutions by relying on the Apache Arrow framework.
Join this live panel as Alex Merced and Anais Dotis-Georgiou dive into:
Advantages to utilizing the Apache Arrow ecosystem
Tips and tricks for implementing the columnar data structure
How developers can best utilize the ASF to innovate and contribute to new industry standards
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
Bevi are the creators of smart water dispensers which empower people to choose their desired beverage — flat or sparkling, their desired flavor and temperature. Since 2014, Bevi users have saved more than 350 million bottles and cans. Their "smart" water coolers have prevented the extraction of 1.4 trillion oz of oil from Earth and have saved 21.7 billion grams of CO2 from the atmosphere.
Discover how Bevi uses a time series database to enable better predictive maintenance and alerting of their entire ecosystem — including the hardware and software. They are using InfluxDB to collect sensor data in real-time remotely from their internet-connected machines about their status and activity — i.e., flavor and CO2 levels, water temp, filter status, etc. They a7re using these metrics to improve their customer experience and continuously improve their sustainability practices. Gain tips and tricks on how to best utilize InfluxDB's schema-less design.
Join this webinar as Spencer Gagnon dives into:
Bevi's approach to reducing organizations' carbon footprint — they are saving 50K+ bottles and cans annually
Their entire system architecture — including InfluxDB Cloud, Grafana, Kafka, and DigitalOcean
The importance of using time-stamped data to extend the life of their machines
Power Your Predictive Analytics with InfluxDBInfluxData
If you're using InfluxDB to store and manage your time series data, you're already off to a great start. But why stop there? In our upcoming webinar, we'll show you how to take your data analysis to the next level by building predictive analytics using a variety of tools and techniques.
We will demonstrate how to use Quix to create custom dashboards and visualizations that allow you to monitor your data in real-time. We'll also introduce you to Hugging Face, a powerful tool for building models that can predict future trends and identify anomalies. With these tools at your disposal, you'll be able to extract valuable insights from your data and make more informed decisions about the future. Don't miss out on this opportunity to improve your data analysis skills and take your business to the next level!
What you will learn:
Use InfluxDB to store and manage time series data
Utilize Quix and Hugging Face to build models, visualize trends, and identify anomalies
Extract valuable insights from your data
Improve your data analysis skills to make informed decision
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
Are you considering replacing your legacy data historian and moving your OT data to the cloud? Join this technical webinar to learn how to adopt InfluxDB and IO Base - a digital platform used to improve operational efficiencies!
Teréga Solutions are the creators of digital solutions used to improve energy efficiencies and to address decarbonization challenges. Their network includes 5,000+ km of gas pipelines within France; they aim to help France attain carbon neutrality by 2050. With these impressive goals in mind, Teréga has created IO-Base — the digital platform to improve industrial performance, and increase profitability. Creating digital twins for their clients allows them to collect data from all production sites and view it in real time, from anywhere and at any time.
Discover how Teréga uses InfluxDB, Docker, and AWS to monitor its gas and hydrogen pipeline infrastructure. They chose to replace their legacy data historian with InfluxDB — the purpose built time series database. They are collecting more than 100K different metrics at various frequencies — some are collected every 5 seconds to only every 1-2 minutes. THey have reduced overall IT spend by 50% and collect 2x the amount of data at 20x frequency! By using various industrial protocols (Modbus, OPC-UA, etc.), Teréga improved output, reduced the TCO, and is now able to create added-value services: forecast, monitoring, predictive maintenance.
Join this webinar as Thomas Delquié dives into:
Teréga's approach to modernizing fossil fuel pipelines IT systems while improving yields and safety
Their centralized methodology to collecting sensor, hardware, and network metrics
The importance of time series data and why they chose InfluxDB
Build an Edge-to-Cloud Solution with the MING StackInfluxData
FlowForge enables organizations to reliably deliver Node-RED applications in a continuous, collaborative, and secure manner. Node-RED is the popular, low-code programming solution that makes it easy to connect different services using a visual programming environment. InfluxData is the creator of InfluxDB, the purpose-built time series database run by developers at scale and in any environment in the cloud, on-premises, or at the edge.
Jump-start monitoring your industrial IoT devices and discover how to build an edge-to-cloud solution with the MING stack. The MING stack includes Mosquitto/MQTT, InfluxDB, Node-RED, and Grafana. This solution can be used to improve fleet management, enable predictive maintenance of industrial machines and power generation equipment (i.e. turbines and generators) and increase safety practices (i.e. buildings, construction sites). Join this webinar to learn best practices from industrial IoT SME's.
In this webinar, Robert Marcer and Jay Clifford dive into:
Best practices for monitoring sensor data collected by everyone — from the edge to the factory
Tips and tricks for using Node-RED and InfluxDB together
Demo — see Node-RED and InfluxDB live
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
Rust is a systems programming language designed for high performance, type safety, and concurrency. According to Stack Overflow’s annual survey in 2022, Rust is the most loved language with 87% of developers saying they want to continue using it. The same survey also reported that nearly 20% of developers aren’t currently using Rust, but want to start developing using it.
Ockam’s suite of programming libraries, command line tools, and managed cloud services enable developers to orchestrate end-to-end encryption. InfluxDB is the purpose-built time series database developed to handle time series data for IoT, monitoring, and real-time analytics. Ockam was originally developed using C, and InfluxDB was originally written using Go; both solutions have been completely rewritten in Rust. Discover why two founders decided to rewrite their developer tools using Rust, and gain insight into the strategy beforehand and the entire process.
Join this live panel as Mrinal Wadhwa and Paul Dix dive into:
Their approach to rewriting a project in Rust
How to build and train engineering teams
Tips and tricks learned along the way - pitfalls to look out for!
Join this webinar as there will be a live discussion with Q&A
InfluxData is excited to announce the general availability of InfluxDB Cloud Dedicated! It is a fully managed time series database service running on cloud infrastructure resources that are dedicated to a single tenant. With this new offering, we’re excited to provide our customers with additional security options, and more custom configuration options to best suit customers’ workload requirements. Join this webinar to learn more about InfluxDB Cloud, and the new dedicated database service offering!
In this webinar, Balaji Palani and Gary Fowler will dive into:
Key features of the new InfluxDB Cloud Dedicated solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
Many developers and DevOps engineers have become aware of using their observability data to gain greater insights into their infrastructure systems. InfluxDB is the purpose-built time series database used to collect metrics and gain observability into apps, servers, containers, and networks. Developers use InfluxDB to improve the quality and efficiency of their CI/CD pipelines. Start using InfluxDB to aggregate infrastructure and application performance monitoring metrics to enable better anomaly detection, root-cause analysis, and alerting.
This session will demonstrate how to record metrics, logs, and traces with one library — OpenTelemetry — and store them in one open source time series database — InfluxDB. Zoe will demonstrate how easy it is to set up the OpenTelemetry Operator for Kubernetes and to store and analyze your data in InfluxDB.
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
American Metal Processing Company ("AMP") is the US' largest commercial rotary heat treat facility with customers in the automotive, construction, military, and agriculture industries. They use their atmosphere-protected rotary retort furnaces to provide their clients with three primary hardening services: neutral hardening (quench and temper), carburizing, and carbonitriding.
This furnace style ensures consistent, uniform heat treatment process vs. traditional batch-or-belt-style furnaces; excels at processing high volumes of smaller parts with tight tolerances; and improves the strength and toughness of plain carbon steels. Discover why AMP’s use of Telegraf, InfluxDB, Node-RED, and Grafana allows them to gain 24/7 insights into their plant operations and metallurgical results. Learn how they use time-stamped data to gain accurate metrics about their consumables usage, furnace profiles, and machine status.
Join this webinar as Grant Pinkos dives into:
American Metal Processing's approach to heat treating in a digitized environment through connected systems
Their approach to collecting and measuring sensor data to enable predictive maintenance and improve product quality
Why they need a time series database for managing and analyzing vast amounts of time-stamped data
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
Delft University is the oldest and largest technical university in the Netherlands with 25,000+ students. Since 1999, they have had a team of students (undergraduate and graduate) designing, building, and racing cars, as part of the Formula Student worldwide competition. The competition has grown to include teams from 1K+ universities in 20+ countries. Students are responsible for all aspects of car manufacturing (research, construction, testing, developing, marketing, management, and fundraising). Delft University's team includes 90 students across disciplines.
Discover how Delft University's team uses Marple and InfluxDB to collect telemetry and sensor metrics while they develop, test, and race their electrics cars. They collect sensor data about their EV's control systems using a time series platform. During races, they are collecting IoT data about their batteries, accelerometer, gyroscope, tires, etc. The engineers are able to share important car stats during races which help the drivers tweak their driving decisions — all with the goal of winning. After races, the entire team are able to analyze data in Marple to understand what to do better next time. By using Marple + InfluxDB, their team are able to collect, share and analyze high frequency car data used to make their car faster at competitions.
Join this webinar as Robbin Baauw and Nero Vanbiervliet dive into:
Marple's approach to empowering engineers to organize, analyze, and visualize their data
Delft University's collaborative methodology to building and racing their Formula-style race car
How InfluxDB is crucial to their collaborative engineering and racing process
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
InfluxData is excited to announce the general availability of InfluxDB Cloud's new storage engine! It is a cloud-native, real-time, columnar database optimized for time series data. InfluxDB's rebuilt core was coded in Rust and sits on top of Apache Arrow and DataFusion. InfluxData's team picked Apache Parquet as the persistent format. In this webinar, Paul Dix and Balaji Palani will demonstrate key product features including the removal of cardinality limits!
They will dive into:
The next phase of the InfluxDB platform
How using Apache Arrow's ecosystem has improved InfluxDB's performance and scalability
Key features of InfluxDB Cloud's new core — including SQL native support
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
balena.io helps companies develop, deploy, update, and manage IoT devices. By using Linux containers and other cloud technologies, balena enables teams to quickly and easily build fleets of connected devices. Developers are able to use containers with the language of choice and pull IoT sensor data from 70+ different single board computers into balenaCloud. Discover how to use balena.io to automate your InfluxDB deployments at the edge!
During this one-hour session, experts from balena and InfluxData will demonstrate how to build and deploy your own air quality IoT solution. You will learn:
The fundamentals of IoT sensor deployment and management using balena.
How to use a time series platform to collect and visualize metrics from edge devices.
Tips and tricks to using balenaCloud to automate InfluxDB deployments and Telegraf configurations.
How to use InfluxDB's Edge Data Replication feature to collect sensor data and push it to InfluxDB Cloud for analysis.
No coding experience required, just a curiosity to start your own IoT adventure.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
RudderStack — the creators of the leading open source Customer Data Platform (CDP) — needed a scalable way to collect and store metrics related to customer events and processing times (down to the nanosecond). They provide their clients with data pipelines that simplify data collection from applications, websites, and SaaS platforms. RudderStack's solution enables clients to stream customer data in real time — they quickly deploy flexible data pipelines that send the data to the customer's entire stack without engineering headaches. Customers are able to stream data from any tool using their 16+ SDK's, and they are able to transform the data in-transit using JavaScript or Python. How does RudderStack use a time series platform to provide their customers with real-time analytics?
Join this webinar as Ryan McCrary dives into:
RudderStack's approach to streamlining data pipelines with their 180+ out-of-the-box integrations
Their data architecture including Kapacitor for alerting and Grafana for customized dashboards
Why using InfluxDB was crucial for them for fast data collection and providing single-sources of truths for their customers
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
Customers using ThingWorx and the Manufacturing Solutions often need to store property data longer than the Solutions default to. These customers are recommended to use InfluxDB, and this presentation will cover the key considerations for moving to InfluxDB vs the standard ThingWorx value streams. Join this session as Ward highlights ThingWorx’s solution and its easy implementation process.
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
Two new features are coming to Flux that add flexibility
and functionality to your data workflow—polymorphic
labels and dynamic types. This session walks through
these new features and shows how they work.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improve Efficiencies
1. Texas Instruments – The Road to
Understanding Inefficiencies with
InfluxDB
Presentation by: Mike Hinkle
Date: 05-26-2020
1
2. 2
Table of Contents
• About Texas Instruments
• What we do
• About me
• Common terms (Tools, States, Modules, Metrics, etc..)
• Data discussion
• Examples of data based decisions
• Current challenges
• InfluxDB and our current use case
• Why InfluxDB?
• Current setup and metrics captured
• Examples of dashboards currently in use
• Example of Grafana alert using InfluxDB derivative() on signal/trace
• Issues encountered using InfluxDB & Grafana
• What’s next
3. 3
Texas Instruments – Brief Overview
• Texas Instruments was founded in 1951
• Previously ‘Geophysical Service Incorporated’
• Headquarters in Dallas, Texas
• Currently employees ~30k individuals World Wide
• Semiconductor Manufacturing Facilities located in:
• United States
• Germany
• Japan
• China
What we do:
Make calculators (just kidding.. We do, but this is a small part of our portfolio)
We Design, Manufacture, Test, Package, & Sell semiconductor devices
5. 5
Common terms used in presentation
• Wafer – Typically made of Silicon (Si)
• Wafers are traditionally started in
counts of 25 which are referred to
as a LOT
• Lots can have fewer than 25
wafers
• Example of a wafer map is shown
to the right
• Each small square is a single die
• Each die is an IC comprised of
transistors, resistors, etc..
• After mfg and testing, the wafers
are typically diced/cut and
packaged for final testing
6. 6
Common terms used in presentation – Cont.
• Tool – The equipment used to manufacture,
process & test semiconductors (wafers) i.e.:
• Furnaces (seen to right)
• Ion Implanters
• Epi Reactors
• Plasma Etchers
• Metrology (Rs, thickness, etc.)
• Testers
• State – Numeric or character code which
defines the current state of our tools
• PROD, 06T, 01, etc..
• Currently 515 state definitions in DFAB
• States mapped to multiple categories
• LYDOP Furnace Stack
(Phosphorus Doping)
7. 7
Common terms used in presentation – Cont.
• Modules – The division of common tools
or processes into teams and groups for
the purpose of manufacturing
semiconductors
• Diffusion
• Surface Prep.
• Implant
• Epi
• Plasma
• Photo
• MultiProbe & Testprobe
• Modules typically divided into toolsets
and processes and owned by Engineers
• Modules report out often on metrics
• Metrics
• Availability (Ao)
• Overall Equip. Utilization
• Fail Events (raw and norm.)
• Mean Time to Repair
• Mean Time Between Failure
• First Pass Success
• Cycle Time
• … the list goes on …
8. 8
Why is access to fast, accurate data important?
Common Data Based Decisions:
• Process adjustments (time, temperature, pressure, etc…)
• Process Stability (Cpk, UCL, LCL)
• Tool Stability (Ao, OEU, MTTR, MTBF)
• Track preventative maintenance (use based or time based)
• Troubleshooting
• Anomaly Detection and Interdiction
• Yield
• Planning
Inaccurate or misinterpreted data could lead to wafers being scrapped, yield loss,
customer deadlines not being met and ultimately losing valuable business.
9. 9
Current challenges
• We need immediate feedback if/when tool/process issues arise
• Failure to respond could equate to scrapped wafers or missing customer
deadlines and ultimately lost revenue
• Email & text notifications are a must, automated interdiction is the goal
• We need a more efficient means of extracting the relevant data for reporting
• Reduction of time preparing for meetings and presentations
• Most high-level metrics are temporal
• We need a system which can be used efficiently by non-CS/CE employees
• Most employees in our factories come from Electrical, Mechanical,
Chemical Engineering or Physics backgrounds (not full-stack devs.)
• Our primary job does not revolve around creating applications
• We need the ability to prototype and experiment
• Write access to RDBMS is not easy to come by
• Flat file systems are not efficient
10. 10
Why InfluxDB
• Familiarity from personal project
• Installation and setup took minutes
• Multitude of available clients
• NoSQL makes for minimal planning and easier prototyping
• SQL-esque: Do not need to learn a new query language (using InfluxQL)
• Fast query response
• Easier to handle edge cases (metrics split and aggregated across days, years)
• In-build math and forecasting functions (d/dt, integrate, Holt-Winters, etc..)
• Efficient hard disk memory usage
• Currently >1.5M points/records written per day
• Easy plugin for Grafana (experimenting w/ Chronograf now)
• Open Source edition is free (MIT License)
11. 11
Current installation setup
Setup:
• Single InfluxDB instance (OSS) installed on Linux VM
• TSM, WAL & Raft data all stored on expandable development mount
• Loader scripts executed via cronjob, query RDBMS and load InfluxDB
• Data is filtered, sanitized & calculated if possible during initial query
• Grafana server running on separate Linux VM instance
• Apache web server is reverse proxied to point to Grafana server on port 3000
Metrics being collected:
• OEU (Overall Equipment Utilization: [Time Testing Good Wafers / Total Time])
• Ao (Tool Availability: [Uptime / Total Time])
• Occurrences of States (States define the state of our tools, PM’s, PROD, etc..)
• Cycle Time (how long it takes for a wafer to be tested [device tag])
12. 12
Block diagram of current InfluxDB loading scheme
RDBMS
Linux Virtual Machine
• Runs Apache webserver
• Runs Grafana server
• Python cronjobs query
RDBMS, parse data and
write to InfluxDB
Linux Virtual Machine
• Runs InfluxDB instance
• TSM, WAL & Raft
Metadata stored on
expandable development
mount
RDBMS
RDBMS
InfluxDB
14. 14
Decomposing DFAB Probe’s OEU Time-Series
• Measurement
queried using
InfluxDB Python
client
• Mean(OEU) was
grouped by 1 hour
• No differencing or
transformations
were applied to TS
• Period length (T)
for seasonality
defined as 14 days
based on our shift
rotations
• Decomposition
performed using
stats-models
Python library
15. 15
Decomposing DFAB Probe’s OEU Time-Series
• Seasonality is my main interest since I am attempting to understand learned behaviors
• Trend reflects drop in Overall Equipment Utilization post COVID-19 stay-at-home order
• Modeling will need more work, this was a quick example of simple data exploration
Stay-at-home order
issued for Dallas
16. 16
Example of actual Grafana alert sent via email
• Not too concerned if our OEU
drops below our goal for a
short duration of time
• More concerning is if we drop
quickly (rate of change). This
could indicate a larger problem
• In the case seen to the left, an
application was failing to write
to a DB causing the tools to
lock up.
• IT needed to interdict on their
end, but we needed to notify
• InfluxDB’s DERIVATIVE()
function allows us to easily
trigger alerts for this use case
17. 17
Grafana Dashboard – States Counts (1 sample/minute)
• Stacked Barchart, colored by tool state
• Pie charts show the state distribution
• State occurrences (left)
• E10 occurrences (right)
• Single stat panels show how many testers
are in production and how many are down
18. 18
Grafana Dashboard – Ao Diffusion (1 sample/minute)
• Green Trace: Overall Tool Availability % (1m)
• Bottom plot shows my previous toolsets
(POCL & TEOS are tags/indexed)
19. 19
Issues encountered
• Could not write to InfluxDB via the Python client library
• Solution: http_proxy & https_proxy were set and sourced in runcom (.cshrc)
file. I needed to unset these system variables for the script to work as
expected
• Problems using DERIVATIVE() function
• Solution: For my use case, I needed to specify a WHERE clause with a
time constraint.
• Initial DB writes were not showing correct time’s when viewed in Grafana
• Solution: InfluxDB expects UTC time and our DB time stamps are stored
and displayed for ‘America/Chicago’ time
• No primitive for aggregating by month in InfluxQL (1d, 14d, 30d, 1m?)
• Solution: Flux can apparently handle this or I can add a Month tag to the
data for grouping.
20. 20
What’s next
• Prototype Dash app for single toolset (proof-of-concept)
• One-stop shopping for Toolset Health
• Tool stability high level
• Process stability high level
• Radar/spider chart for Cpk, etc..
• InfluxDB will handle data needs
• Work to model high level metrics across shifts for DFAB probe
• Fine-tune and practice decompositions
• Forecasting (planning, capacity, financial, etc..)
• ARIMA (Autoregressive Integrated Moving Average)
• Holt-Winters
• Presentation at work on time-series data and my use of InfluxDB/Grafana
• Open flood gates for new ideas or application
22. We look forward to bringing together our community of
developers in this new format to learn, interact, and share
tips and use cases.
8-9 June, 2020
Hands-On Flux Training
www.influxdays.com
23-24 June, 2020
Virtual Experience