In this talk we’ll learn about some of the biggest challenges beginner Flux users face. We’ll also learn about resources and tools that developers can take advantage of to become a Flux Pro.
Python and Oracle : allies for best of data managementLaurent Leturgez
In this presentation, I described Python and how Python can Interact with Oracle database, and Oracle Cloud Infrastructure in various project : from data visualisation to data science.
Building source code level profiler for C++.pdfssuser28de9e
1. The document describes building a source code level profiler for C++ applications. It outlines 4 milestones: logging execution time, reducing macros, tracking function hit counts, and call path profiling using a radix tree.
2. Key aspects discussed include using timers to log function durations, storing profiling data in a timed entry class, and maintaining a call tree using a radix tree with nodes representing functions and profiling data.
3. The goal is to develop a customizable profiler to identify performance bottlenecks by profiling execution times and call paths at the source code level.
A talk about data workflow tools in Metrics Monday Helsinki.
Both Custobar (https://custobar.com) and ŌURA (https://ouraring.com) are hiring talented developers. Contact me if you are interested in joining either of companies.
Transferring Software Testing Tools to PracticeTao Xie
ACM SIGSOFT Webinar co-presented by Nikolai Tillmann (Microsoft), Judith Bishop (Microsoft Research), Pratap Lakshman (Microsoft), Tao Xie (University of Illinois at Urbana-Champaign) http://www.sigsoft.org/resources/webinars.html
CityLABS Workshop: Working with large tablesEnrico Daga
This document discusses working with large tables and big data processing. It introduces distributed computing as an approach to process large datasets by distributing data across multiple nodes and parallelizing operations. The document then outlines using Apache Hadoop and the MK Data Hub cluster to distribute data storage and processing. It demonstrates how to use tools like Hue, Hive, and Pig to analyze tabular data in a distributed manner at scale. Finally, hands-on examples are provided for computing TF-IDF statistics on the large Gutenberg text corpus.
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
The document discusses preparing data for machine learning by applying data quality techniques in Spark. It introduces concepts of data quality and machine learning data formats. The main part shows how to use User Defined Functions (UDFs) in Spark SQL to automate transforming raw data into the required formats for machine learning, making the process more efficient and reproducible.
This document summarizes an agenda for a Pinterest Engineering meeting. It includes discussions on mobile growth and monetization, deploying and shipping code. Specific topics that will be covered include scaling user education on mobile, growth strategies like user education, monetization through data, and how Pinterest deploys and ships code. Speakers will discuss mobile features, how user growth is driven through education, monetizing user data, and ensuring smooth code deployment.
Python and Oracle : allies for best of data managementLaurent Leturgez
In this presentation, I described Python and how Python can Interact with Oracle database, and Oracle Cloud Infrastructure in various project : from data visualisation to data science.
Building source code level profiler for C++.pdfssuser28de9e
1. The document describes building a source code level profiler for C++ applications. It outlines 4 milestones: logging execution time, reducing macros, tracking function hit counts, and call path profiling using a radix tree.
2. Key aspects discussed include using timers to log function durations, storing profiling data in a timed entry class, and maintaining a call tree using a radix tree with nodes representing functions and profiling data.
3. The goal is to develop a customizable profiler to identify performance bottlenecks by profiling execution times and call paths at the source code level.
A talk about data workflow tools in Metrics Monday Helsinki.
Both Custobar (https://custobar.com) and ŌURA (https://ouraring.com) are hiring talented developers. Contact me if you are interested in joining either of companies.
Transferring Software Testing Tools to PracticeTao Xie
ACM SIGSOFT Webinar co-presented by Nikolai Tillmann (Microsoft), Judith Bishop (Microsoft Research), Pratap Lakshman (Microsoft), Tao Xie (University of Illinois at Urbana-Champaign) http://www.sigsoft.org/resources/webinars.html
CityLABS Workshop: Working with large tablesEnrico Daga
This document discusses working with large tables and big data processing. It introduces distributed computing as an approach to process large datasets by distributing data across multiple nodes and parallelizing operations. The document then outlines using Apache Hadoop and the MK Data Hub cluster to distribute data storage and processing. It demonstrates how to use tools like Hue, Hive, and Pig to analyze tabular data in a distributed manner at scale. Finally, hands-on examples are provided for computing TF-IDF statistics on the large Gutenberg text corpus.
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
The document discusses preparing data for machine learning by applying data quality techniques in Spark. It introduces concepts of data quality and machine learning data formats. The main part shows how to use User Defined Functions (UDFs) in Spark SQL to automate transforming raw data into the required formats for machine learning, making the process more efficient and reproducible.
This document summarizes an agenda for a Pinterest Engineering meeting. It includes discussions on mobile growth and monetization, deploying and shipping code. Specific topics that will be covered include scaling user education on mobile, growth strategies like user education, monetization through data, and how Pinterest deploys and ships code. Speakers will discuss mobile features, how user growth is driven through education, monetizing user data, and ensuring smooth code deployment.
This document discusses bridging big data and data science using scalable workflows. It describes how scientific workflows can integrate various data science tools and processes to analyze large datasets. Workflows allow standardized, programmable, and reproducible analysis at scale. Examples are provided of workflows developed at the San Diego Supercomputer Center for applications in bioinformatics, wildfire management, and other domains. The document advocates conceptualizing computational analyses as workflows to facilitate collaboration between data scientists and developers.
R is a programming language for statistical analysis and graphics. It is an open-source language developed by statisticians to allow for easy statistical analysis and visualization of data. The document provides an overview of R, discussing its origins, functionality, uses in data science, and popular packages and IDEs used with R. Examples are given of basic R syntax for vectors, matrices, data frames, plotting, and applying functions to data.
Rails is a great Ruby-based framework for producing web sites quickly and effectively. Here are a bunch of tips and best practices aimed at the Ruby newbie.
This document discusses the design of an intelligent assistant named Artemis to help security analysts with their workflow. It describes conducting user research to understand pain points of different security roles. The findings showed analysts lack context for alerts and have repetitive tasks. The document outlines requirements for Artemis, including understanding natural language, providing context-driven investigations, educating users, recommending next steps, and expediting collections using workflows. It provides examples of how Artemis could assist analysts and help optimize their time by automating common tasks. In the end, the goal is to design tools that better utilize analyst resources and are easier for a diverse range of users.
Value streammapping cascadiait2014-mceniryChris McEniry
This document provides an overview of value stream mapping as a Lean technique. It discusses how to map the current and future states of a value stream by visualizing the flow of materials and information from raw materials to the customer. Key aspects covered include identifying processes, work centers, inputs/outputs, and wait times to understand where waste exists. The document uses an example of storage provisioning to demonstrate mapping multiple levels from individual processes to the full value stream and planning improvements.
R is an open source programming language used for statistical analysis and graphics. It allows users to create objects like vectors, matrices, data frames and lists to manipulate and analyze data. RStudio is an integrated development environment for R that provides a user interface, debugging tools and package management. The document introduces key R concepts like data types, packages and resources for learning R. It also provides best practices for file management, naming conventions and version control when programming in R.
The document discusses various techniques for optimizing and scaling MongoDB deployments. It covers topics like schema design, indexing, monitoring workload, vertical scaling using resources like RAM and SSDs, and horizontal scaling using sharding. The key recommendations are to optimize the schema and indexes first before scaling, understand the workload, and ensure proper indexing when using sharding for horizontal scaling.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
This document introduces distributed computing and tools for processing large tabular data using the Big Data Cluster. It discusses how distributed computing allows tabular data to be replicated across nodes and computation to be parallelized. It then provides an overview of Hadoop and how the Big Data Cluster can be used with tools like Hue, Hive, and Pig to perform analytics on large datasets. Finally, it walks through an example of computing TF-IDF scores on a corpus of text documents from Project Gutenberg.
This document discusses reproducible research and provides guidance on key practices and tools to support reproducibility. It defines reproducibility as distributing all data, code, and tools required to reproduce published research results. Version control systems like Git allow researchers to track changes over time and collaborate more effectively. Tools like DMPTool can help researchers create data management plans and plan for long-term storage and sharing of research data and materials. R Markdown allows integrating human-readable text with executable code to produce reproducible reports and analyses.
Building Search & Recommendation EnginesTrey Grainger
In this talk, you'll learn how to build your own search and recommendation engine based on the open source Apache Lucene/Solr project. We'll dive into some of the data science behind how search engines work, covering multi-lingual text analysis, natural language processing, relevancy ranking algorithms, knowledge graphs, reflected intelligence, collaborative filtering, and other machine learning techniques used to drive relevant results for free-text queries. We'll also demonstrate how to build a recommendation engine leveraging the same platform and techniques that power search for most of the world's top companies. You'll walk away from this presentation with the toolbox you need to go and implement your very own search-based product using your own data.
InfluxDB is an open source time series database written in Go that stores metric data and performs real-time analytics. It has no external dependencies. InfluxDB stores data as time series with measurements, tags, and fields. Data is written using a line protocol and can be visualized using Grafana, an open source metrics dashboard.
Creating Developer-Friendly Docker Containers with ChaperoneGary Wisniewski
The document discusses creating developer-friendly containers using Chaperone and the chaperone-baseimage family. Chaperone is a process manager that provides services like logging, cron jobs, and orderly shutdown within containers. The chaperone-baseimage images use Chaperone to provide three personalities for containers: closed, attached-data, and development. This allows developers to have a consistent environment to develop applications without understanding container internals. The development model mounts the container's infrastructure to the developer's local directory for easy editing of code and data outside the container.
Presentation for Harvard's ABCD Technology in Education group:
The Institute for Quantitative Social Science (IQSS) is a unique entity at Harvard - it combines research, software development, and specialized services to provide innovative solutions to research and scholarship problems at Harvard and beyond. I will talk about the software projects that IQSS is currently working on (Dataverse, Zelig, Consilience, and OpenScholar), including the research and development processes, the benefits provided to the Harvard community, and the impacts on research and scholarship.
Learn who is best suited to attend the full training, what prior knowledge you should have, and what topics the course covers. Cloudera Curriculum Developer, Jesse Anderson, will discuss the skills you will attain during the course and how they will help you move make the most of your HBase deployment in development or production and prepare for the Cloudera Certified Specialist in Apache HBase (CCSHB) exam.
The document discusses different types of computing hardware, including servers, clusters, and the cloud. It describes how servers are similar to personal computers but are designed to run continuously and provide resources to other devices on a network. Clusters involve linking multiple servers together to efficiently run large jobs in parallel. Jobs are submitted to a head node which manages allocating resources and distributing work across nodes. This allows for faster processing of large tasks compared to independent servers.
InfluxData is excited to announce InfluxDB Clustered, the self-managed version of InfluxDB 3.0 with unparalleled flexibility, speed, performance, and scale. The evolution of InfluxDB Enterprise, InfluxDB Clustered is delivered as a collection of Kubernetes-based containers and services, which enables you to run and operate InfluxDB 3.0 where you need it, whether that's on-premises or in a private cloud environment. With this new enterprise offering, we’re excited to provide our customers with real-time queries, low-cost object storage, unlimited cardinality, and SQL language support – all with improved data access, support, and security! The newest version of InfluxDB was built on Apache Arrow, and through the open source ecosystem and integrations, extends the value of your time-stamped data.
Join this webinar to learn more about InfluxDB Clustered, and how to manage your large mission-critical workloads in the highly available database service offering!
In this webinar, Balaji Palani and Gunnar Aasen will dive into:
Key features of the new InfluxDB Clustered solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
Apache Arrow is an open source project intended to provide a standardized columnar memory format for flat and hierarchical data. It enables more efficient analytics workloads for modern CPU and GPU hardware, which makes working with large data sets easier and cheaper.
InfluxData and Dremio are both members of the Apache Software Foundation (ASF). Dremio is a data lakehouse management service known for its scalability and capacity for direct querying across diverse data sources. InfluxDB is the purpose-built time series database, and InfluxDB 3.0 has a new columnar storage engine and uses the Arrow format for representing data and moving data to and from Parquet. Discover how InfluxDB and Dremio have advanced their solutions by relying on the Apache Arrow framework.
Join this live panel as Alex Merced and Anais Dotis-Georgiou dive into:
Advantages to utilizing the Apache Arrow ecosystem
Tips and tricks for implementing the columnar data structure
How developers can best utilize the ASF to innovate and contribute to new industry standards
More Related Content
Similar to Anais Dotis-Georgiou [InfluxData] | Becoming a Flux Pro | InfluxDays 2022
This document discusses bridging big data and data science using scalable workflows. It describes how scientific workflows can integrate various data science tools and processes to analyze large datasets. Workflows allow standardized, programmable, and reproducible analysis at scale. Examples are provided of workflows developed at the San Diego Supercomputer Center for applications in bioinformatics, wildfire management, and other domains. The document advocates conceptualizing computational analyses as workflows to facilitate collaboration between data scientists and developers.
R is a programming language for statistical analysis and graphics. It is an open-source language developed by statisticians to allow for easy statistical analysis and visualization of data. The document provides an overview of R, discussing its origins, functionality, uses in data science, and popular packages and IDEs used with R. Examples are given of basic R syntax for vectors, matrices, data frames, plotting, and applying functions to data.
Rails is a great Ruby-based framework for producing web sites quickly and effectively. Here are a bunch of tips and best practices aimed at the Ruby newbie.
This document discusses the design of an intelligent assistant named Artemis to help security analysts with their workflow. It describes conducting user research to understand pain points of different security roles. The findings showed analysts lack context for alerts and have repetitive tasks. The document outlines requirements for Artemis, including understanding natural language, providing context-driven investigations, educating users, recommending next steps, and expediting collections using workflows. It provides examples of how Artemis could assist analysts and help optimize their time by automating common tasks. In the end, the goal is to design tools that better utilize analyst resources and are easier for a diverse range of users.
Value streammapping cascadiait2014-mceniryChris McEniry
This document provides an overview of value stream mapping as a Lean technique. It discusses how to map the current and future states of a value stream by visualizing the flow of materials and information from raw materials to the customer. Key aspects covered include identifying processes, work centers, inputs/outputs, and wait times to understand where waste exists. The document uses an example of storage provisioning to demonstrate mapping multiple levels from individual processes to the full value stream and planning improvements.
R is an open source programming language used for statistical analysis and graphics. It allows users to create objects like vectors, matrices, data frames and lists to manipulate and analyze data. RStudio is an integrated development environment for R that provides a user interface, debugging tools and package management. The document introduces key R concepts like data types, packages and resources for learning R. It also provides best practices for file management, naming conventions and version control when programming in R.
The document discusses various techniques for optimizing and scaling MongoDB deployments. It covers topics like schema design, indexing, monitoring workload, vertical scaling using resources like RAM and SSDs, and horizontal scaling using sharding. The key recommendations are to optimize the schema and indexes first before scaling, understand the workload, and ensure proper indexing when using sharding for horizontal scaling.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
This document introduces distributed computing and tools for processing large tabular data using the Big Data Cluster. It discusses how distributed computing allows tabular data to be replicated across nodes and computation to be parallelized. It then provides an overview of Hadoop and how the Big Data Cluster can be used with tools like Hue, Hive, and Pig to perform analytics on large datasets. Finally, it walks through an example of computing TF-IDF scores on a corpus of text documents from Project Gutenberg.
This document discusses reproducible research and provides guidance on key practices and tools to support reproducibility. It defines reproducibility as distributing all data, code, and tools required to reproduce published research results. Version control systems like Git allow researchers to track changes over time and collaborate more effectively. Tools like DMPTool can help researchers create data management plans and plan for long-term storage and sharing of research data and materials. R Markdown allows integrating human-readable text with executable code to produce reproducible reports and analyses.
Building Search & Recommendation EnginesTrey Grainger
In this talk, you'll learn how to build your own search and recommendation engine based on the open source Apache Lucene/Solr project. We'll dive into some of the data science behind how search engines work, covering multi-lingual text analysis, natural language processing, relevancy ranking algorithms, knowledge graphs, reflected intelligence, collaborative filtering, and other machine learning techniques used to drive relevant results for free-text queries. We'll also demonstrate how to build a recommendation engine leveraging the same platform and techniques that power search for most of the world's top companies. You'll walk away from this presentation with the toolbox you need to go and implement your very own search-based product using your own data.
InfluxDB is an open source time series database written in Go that stores metric data and performs real-time analytics. It has no external dependencies. InfluxDB stores data as time series with measurements, tags, and fields. Data is written using a line protocol and can be visualized using Grafana, an open source metrics dashboard.
Creating Developer-Friendly Docker Containers with ChaperoneGary Wisniewski
The document discusses creating developer-friendly containers using Chaperone and the chaperone-baseimage family. Chaperone is a process manager that provides services like logging, cron jobs, and orderly shutdown within containers. The chaperone-baseimage images use Chaperone to provide three personalities for containers: closed, attached-data, and development. This allows developers to have a consistent environment to develop applications without understanding container internals. The development model mounts the container's infrastructure to the developer's local directory for easy editing of code and data outside the container.
Presentation for Harvard's ABCD Technology in Education group:
The Institute for Quantitative Social Science (IQSS) is a unique entity at Harvard - it combines research, software development, and specialized services to provide innovative solutions to research and scholarship problems at Harvard and beyond. I will talk about the software projects that IQSS is currently working on (Dataverse, Zelig, Consilience, and OpenScholar), including the research and development processes, the benefits provided to the Harvard community, and the impacts on research and scholarship.
Learn who is best suited to attend the full training, what prior knowledge you should have, and what topics the course covers. Cloudera Curriculum Developer, Jesse Anderson, will discuss the skills you will attain during the course and how they will help you move make the most of your HBase deployment in development or production and prepare for the Cloudera Certified Specialist in Apache HBase (CCSHB) exam.
The document discusses different types of computing hardware, including servers, clusters, and the cloud. It describes how servers are similar to personal computers but are designed to run continuously and provide resources to other devices on a network. Clusters involve linking multiple servers together to efficiently run large jobs in parallel. Jobs are submitted to a head node which manages allocating resources and distributing work across nodes. This allows for faster processing of large tasks compared to independent servers.
Similar to Anais Dotis-Georgiou [InfluxData] | Becoming a Flux Pro | InfluxDays 2022 (20)
InfluxData is excited to announce InfluxDB Clustered, the self-managed version of InfluxDB 3.0 with unparalleled flexibility, speed, performance, and scale. The evolution of InfluxDB Enterprise, InfluxDB Clustered is delivered as a collection of Kubernetes-based containers and services, which enables you to run and operate InfluxDB 3.0 where you need it, whether that's on-premises or in a private cloud environment. With this new enterprise offering, we’re excited to provide our customers with real-time queries, low-cost object storage, unlimited cardinality, and SQL language support – all with improved data access, support, and security! The newest version of InfluxDB was built on Apache Arrow, and through the open source ecosystem and integrations, extends the value of your time-stamped data.
Join this webinar to learn more about InfluxDB Clustered, and how to manage your large mission-critical workloads in the highly available database service offering!
In this webinar, Balaji Palani and Gunnar Aasen will dive into:
Key features of the new InfluxDB Clustered solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
Apache Arrow is an open source project intended to provide a standardized columnar memory format for flat and hierarchical data. It enables more efficient analytics workloads for modern CPU and GPU hardware, which makes working with large data sets easier and cheaper.
InfluxData and Dremio are both members of the Apache Software Foundation (ASF). Dremio is a data lakehouse management service known for its scalability and capacity for direct querying across diverse data sources. InfluxDB is the purpose-built time series database, and InfluxDB 3.0 has a new columnar storage engine and uses the Arrow format for representing data and moving data to and from Parquet. Discover how InfluxDB and Dremio have advanced their solutions by relying on the Apache Arrow framework.
Join this live panel as Alex Merced and Anais Dotis-Georgiou dive into:
Advantages to utilizing the Apache Arrow ecosystem
Tips and tricks for implementing the columnar data structure
How developers can best utilize the ASF to innovate and contribute to new industry standards
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
Bevi are the creators of smart water dispensers which empower people to choose their desired beverage — flat or sparkling, their desired flavor and temperature. Since 2014, Bevi users have saved more than 350 million bottles and cans. Their "smart" water coolers have prevented the extraction of 1.4 trillion oz of oil from Earth and have saved 21.7 billion grams of CO2 from the atmosphere.
Discover how Bevi uses a time series database to enable better predictive maintenance and alerting of their entire ecosystem — including the hardware and software. They are using InfluxDB to collect sensor data in real-time remotely from their internet-connected machines about their status and activity — i.e., flavor and CO2 levels, water temp, filter status, etc. They a7re using these metrics to improve their customer experience and continuously improve their sustainability practices. Gain tips and tricks on how to best utilize InfluxDB's schema-less design.
Join this webinar as Spencer Gagnon dives into:
Bevi's approach to reducing organizations' carbon footprint — they are saving 50K+ bottles and cans annually
Their entire system architecture — including InfluxDB Cloud, Grafana, Kafka, and DigitalOcean
The importance of using time-stamped data to extend the life of their machines
Power Your Predictive Analytics with InfluxDBInfluxData
If you're using InfluxDB to store and manage your time series data, you're already off to a great start. But why stop there? In our upcoming webinar, we'll show you how to take your data analysis to the next level by building predictive analytics using a variety of tools and techniques.
We will demonstrate how to use Quix to create custom dashboards and visualizations that allow you to monitor your data in real-time. We'll also introduce you to Hugging Face, a powerful tool for building models that can predict future trends and identify anomalies. With these tools at your disposal, you'll be able to extract valuable insights from your data and make more informed decisions about the future. Don't miss out on this opportunity to improve your data analysis skills and take your business to the next level!
What you will learn:
Use InfluxDB to store and manage time series data
Utilize Quix and Hugging Face to build models, visualize trends, and identify anomalies
Extract valuable insights from your data
Improve your data analysis skills to make informed decision
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
Are you considering replacing your legacy data historian and moving your OT data to the cloud? Join this technical webinar to learn how to adopt InfluxDB and IO Base - a digital platform used to improve operational efficiencies!
Teréga Solutions are the creators of digital solutions used to improve energy efficiencies and to address decarbonization challenges. Their network includes 5,000+ km of gas pipelines within France; they aim to help France attain carbon neutrality by 2050. With these impressive goals in mind, Teréga has created IO-Base — the digital platform to improve industrial performance, and increase profitability. Creating digital twins for their clients allows them to collect data from all production sites and view it in real time, from anywhere and at any time.
Discover how Teréga uses InfluxDB, Docker, and AWS to monitor its gas and hydrogen pipeline infrastructure. They chose to replace their legacy data historian with InfluxDB — the purpose built time series database. They are collecting more than 100K different metrics at various frequencies — some are collected every 5 seconds to only every 1-2 minutes. THey have reduced overall IT spend by 50% and collect 2x the amount of data at 20x frequency! By using various industrial protocols (Modbus, OPC-UA, etc.), Teréga improved output, reduced the TCO, and is now able to create added-value services: forecast, monitoring, predictive maintenance.
Join this webinar as Thomas Delquié dives into:
Teréga's approach to modernizing fossil fuel pipelines IT systems while improving yields and safety
Their centralized methodology to collecting sensor, hardware, and network metrics
The importance of time series data and why they chose InfluxDB
Build an Edge-to-Cloud Solution with the MING StackInfluxData
FlowForge enables organizations to reliably deliver Node-RED applications in a continuous, collaborative, and secure manner. Node-RED is the popular, low-code programming solution that makes it easy to connect different services using a visual programming environment. InfluxData is the creator of InfluxDB, the purpose-built time series database run by developers at scale and in any environment in the cloud, on-premises, or at the edge.
Jump-start monitoring your industrial IoT devices and discover how to build an edge-to-cloud solution with the MING stack. The MING stack includes Mosquitto/MQTT, InfluxDB, Node-RED, and Grafana. This solution can be used to improve fleet management, enable predictive maintenance of industrial machines and power generation equipment (i.e. turbines and generators) and increase safety practices (i.e. buildings, construction sites). Join this webinar to learn best practices from industrial IoT SME's.
In this webinar, Robert Marcer and Jay Clifford dive into:
Best practices for monitoring sensor data collected by everyone — from the edge to the factory
Tips and tricks for using Node-RED and InfluxDB together
Demo — see Node-RED and InfluxDB live
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
The document is an agenda for a discussion between the CTO and founder of Ockam, Mrinal Wadhwa, and the CTO and founder of InfluxData, Paul Dix, about rewriting products using the Rust programming language. It includes an introduction of the founders, an overview of the discussion topics like why they decided to rewrite in Rust and the challenges they faced, how they got their engineers comfortable with Rust, tips they learned in the process, benefits gained from moving to Rust, and how their communities responded to the switch.
InfluxData is excited to announce the general availability of InfluxDB Cloud Dedicated! It is a fully managed time series database service running on cloud infrastructure resources that are dedicated to a single tenant. With this new offering, we’re excited to provide our customers with additional security options, and more custom configuration options to best suit customers’ workload requirements. Join this webinar to learn more about InfluxDB Cloud, and the new dedicated database service offering!
In this webinar, Balaji Palani and Gary Fowler will dive into:
Key features of the new InfluxDB Cloud Dedicated solution
Use cases for using the newest version of the purpose-built time series database
Live demo
During this 1-hour technical webinar, you’ll also get a chance to ask your questions live.
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
Many developers and DevOps engineers have become aware of using their observability data to gain greater insights into their infrastructure systems. InfluxDB is the purpose-built time series database used to collect metrics and gain observability into apps, servers, containers, and networks. Developers use InfluxDB to improve the quality and efficiency of their CI/CD pipelines. Start using InfluxDB to aggregate infrastructure and application performance monitoring metrics to enable better anomaly detection, root-cause analysis, and alerting.
This session will demonstrate how to record metrics, logs, and traces with one library — OpenTelemetry — and store them in one open source time series database — InfluxDB. Zoe will demonstrate how easy it is to set up the OpenTelemetry Operator for Kubernetes and to store and analyze your data in InfluxDB.
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
American Metal Processing Company ("AMP") is the US' largest commercial rotary heat treat facility with customers in the automotive, construction, military, and agriculture industries. They use their atmosphere-protected rotary retort furnaces to provide their clients with three primary hardening services: neutral hardening (quench and temper), carburizing, and carbonitriding.
This furnace style ensures consistent, uniform heat treatment process vs. traditional batch-or-belt-style furnaces; excels at processing high volumes of smaller parts with tight tolerances; and improves the strength and toughness of plain carbon steels. Discover why AMP’s use of Telegraf, InfluxDB, Node-RED, and Grafana allows them to gain 24/7 insights into their plant operations and metallurgical results. Learn how they use time-stamped data to gain accurate metrics about their consumables usage, furnace profiles, and machine status.
Join this webinar as Grant Pinkos dives into:
American Metal Processing's approach to heat treating in a digitized environment through connected systems
Their approach to collecting and measuring sensor data to enable predictive maintenance and improve product quality
Why they need a time series database for managing and analyzing vast amounts of time-stamped data
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
Delft University is the oldest and largest technical university in the Netherlands with 25,000+ students. Since 1999, they have had a team of students (undergraduate and graduate) designing, building, and racing cars, as part of the Formula Student worldwide competition. The competition has grown to include teams from 1K+ universities in 20+ countries. Students are responsible for all aspects of car manufacturing (research, construction, testing, developing, marketing, management, and fundraising). Delft University's team includes 90 students across disciplines.
Discover how Delft University's team uses Marple and InfluxDB to collect telemetry and sensor metrics while they develop, test, and race their electrics cars. They collect sensor data about their EV's control systems using a time series platform. During races, they are collecting IoT data about their batteries, accelerometer, gyroscope, tires, etc. The engineers are able to share important car stats during races which help the drivers tweak their driving decisions — all with the goal of winning. After races, the entire team are able to analyze data in Marple to understand what to do better next time. By using Marple + InfluxDB, their team are able to collect, share and analyze high frequency car data used to make their car faster at competitions.
Join this webinar as Robbin Baauw and Nero Vanbiervliet dive into:
Marple's approach to empowering engineers to organize, analyze, and visualize their data
Delft University's collaborative methodology to building and racing their Formula-style race car
How InfluxDB is crucial to their collaborative engineering and racing process
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
InfluxData is excited to announce the general availability of InfluxDB Cloud's new storage engine! It is a cloud-native, real-time, columnar database optimized for time series data. InfluxDB's rebuilt core was coded in Rust and sits on top of Apache Arrow and DataFusion. InfluxData's team picked Apache Parquet as the persistent format. In this webinar, Paul Dix and Balaji Palani will demonstrate key product features including the removal of cardinality limits!
They will dive into:
The next phase of the InfluxDB platform
How using Apache Arrow's ecosystem has improved InfluxDB's performance and scalability
Key features of InfluxDB Cloud's new core — including SQL native support
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
balena.io helps companies develop, deploy, update, and manage IoT devices. By using Linux containers and other cloud technologies, balena enables teams to quickly and easily build fleets of connected devices. Developers are able to use containers with the language of choice and pull IoT sensor data from 70+ different single board computers into balenaCloud. Discover how to use balena.io to automate your InfluxDB deployments at the edge!
During this one-hour session, experts from balena and InfluxData will demonstrate how to build and deploy your own air quality IoT solution. You will learn:
The fundamentals of IoT sensor deployment and management using balena.
How to use a time series platform to collect and visualize metrics from edge devices.
Tips and tricks to using balenaCloud to automate InfluxDB deployments and Telegraf configurations.
How to use InfluxDB's Edge Data Replication feature to collect sensor data and push it to InfluxDB Cloud for analysis.
No coding experience required, just a curiosity to start your own IoT adventure.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
RudderStack — the creators of the leading open source Customer Data Platform (CDP) — needed a scalable way to collect and store metrics related to customer events and processing times (down to the nanosecond). They provide their clients with data pipelines that simplify data collection from applications, websites, and SaaS platforms. RudderStack's solution enables clients to stream customer data in real time — they quickly deploy flexible data pipelines that send the data to the customer's entire stack without engineering headaches. Customers are able to stream data from any tool using their 16+ SDK's, and they are able to transform the data in-transit using JavaScript or Python. How does RudderStack use a time series platform to provide their customers with real-time analytics?
Join this webinar as Ryan McCrary dives into:
RudderStack's approach to streamlining data pipelines with their 180+ out-of-the-box integrations
Their data architecture including Kapacitor for alerting and Grafana for customized dashboards
Why using InfluxDB was crucial for them for fast data collection and providing single-sources of truths for their customers
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
Customers using ThingWorx and the Manufacturing Solutions often need to store property data longer than the Solutions default to. These customers are recommended to use InfluxDB, and this presentation will cover the key considerations for moving to InfluxDB vs the standard ThingWorx value streams. Join this session as Ward highlights ThingWorx’s solution and its easy implementation process.
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
Two new features are coming to Flux that add flexibility
and functionality to your data workflow—polymorphic
labels and dynamic types. This session walks through
these new features and shows how they work.
This document outlines the schedule for Day 2 of InfluxDays 2022, an event hosted by InfluxData. The schedule includes sessions on building developer experience, how developers like to work, an overview of the InfluxDB developer console and API, demos of client libraries and the InfluxDB v2 API, tips for getting involved in the InfluxDB community and university, use cases for networking monitoring, crypto/fintech, monitoring/observability, and IIoT, and closing thoughts. Recordings of all sessions will be made available to registered attendees by November 7th. Upcoming events include advanced Flux training in London and resources through the community forums, Slack channel, and online university.
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData
This document contains the agenda for Day 2 of InfluxDays 2022, which includes:
- Welcome and introductory remarks from Zoe Steinkamp and Jay Clifford of InfluxData.
- Fireside chats and presentations on building great developer experiences, how developers like to work, and use cases for InfluxDB from companies like Tesla, InfluxData, and others.
- Sessions on the InfluxDB developer console, APIs, client libraries, getting involved in the community, accelerating time to awesome with InfluxDB University, and tips for analyzing IoT data with InfluxDB.
- Closing thoughts from Zoe Steinkamp and Jay Clifford, as well as
The document summarizes the agenda and sessions for Day 1 of InfluxDays 2022. It includes sessions on InfluxDB data collection, scripting languages like Flux, the InfluxDB time series engine, tasks, storage, and a closing discussion. The agenda involves talks from InfluxData employees on building applications with real-time data, navigating the developer experience, solving problems, the InfluxDB platform, community, education, use cases in crypto/fintech and IIoT, and tips/tricks for analysis.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Anais Dotis-Georgiou [InfluxData] | Becoming a Flux Pro | InfluxDays 2022
1.
2. Becoming a Flux Pro
Anais Dotis-Georgiou
Developer Advocate, InfluxData
3. Connect Learn Build
Hear from and meet developers
from the InfluxDB Community
Be inspired by use cases from
our partners and InfluxDB engineers
Learn best practices that will
help you build great experiences
for your projects
4. In this talk we’ll learn about some of the biggest
challenges beginner Flux users face. We’ll also learn
about resources and tools that developers can take
advantage of to become a Flux Pro.
Anais Dotis-Georgiou
Lead Developer Advocate
Anais Dotis-Georgiou is a developer advocate at
InfluxData with a passion for making data beautiful
using data analytics, AI, and machine learning. She
takes the data that she collects and does a mix of
research, exploration, and engineering to translate the
data into something of function, value, and beauty.
Becoming a Flux Pro
5. Agenda
1. Understanding critical functions
2. Utilizing existing tools to expedite learning
3. Taking advantage of learning resources
6. From Series to Tables on Disk
measurement1 field1=1i,field2=1,field3="a"
measurement1 field1=1i,field2=2,field3="b"
7. Grouping
• Flux operates on streams of tables.
• Every table has a group key – a list of columns for which every
row in the table has the same value. Tables are defined by their
group keys.
• You can use Flux to combine or divide tables.
12. Not in
Group Key
In Group Key In Group
Key
Not in
Group
Key
In Group Key In Group Key In Group Key Not in Group
Key
table
before
_measurement _field _value location _start _stop _time
0 average_temperature temperature 82.0 coyote_creek rfc3339startTime rfc3339stopTime rfc3339time1
0 average_temperature temperature 73.0 coyote_creek rfc3339startTime rfc3339stopTime rfc3339time2
0 average_temperature temperature 86.0 coyote_creek rfc3339startTime rfc3339stopTime rfc3339time3
1 average_temperature temperature 85.0 santa_monica rfc3339startTime rfc3339stopTime rfc3339time1
1 average_temperature temperature 74.0 santa_monica rfc3339startTime rfc3339stopTime rfc3339time2
1 average_temperature temperature 80.0 santa_monica rfc3339startTime rfc3339stopTime rfc3339time3
13. Not in Group
Key
In Group Key Not In Group
Key
Not in
Group
Key
Not In Group
Key
Not In Group Key Not In Group Key Not in Group
Key
table
group on
measurement
_measurement _field _value location _start _stop _time
0 average_tempera
ture
temperature 82.0 coyote_creek rfc3339startTime rfc3339stopTime rfc3339time1
0 average_tempera
ture
temperature 73.0 coyote_creek rfc3339startTime rfc3339stopTime rfc3339time2
0 average_tempera
ture
temperature 86.0 coyote_creek rfc3339startTime rfc3339stopTime rfc3339time3
0 average_tempera
ture
temperature 85.0 santa_monica rfc3339startTime rfc3339stopTime rfc3339time1
0 average_tempera
ture
temperature 74.0 santa_monica rfc3339startTime rfc3339stopTime rfc3339time2
0 average_tempera
ture
temperature 80.0 santa_monica rfc3339startTime rfc3339stopTime rfc3339time3
15. Not in
Group Key
In Group Key Not in
Group
Key
table
mean
_measurement _value
0 average_temperature 80.0
16. Embracing
experimentation
• Learning a new language involves embracing
experimentation. Understanding happens through
trial and error.
• Tools that facilitate experimentation:
• InfluxDB UI
• Flux extension for VS Code
29. How do I calculate the
difference in temp across
both locations?
30. Not in
Group Key
In Group Key In Group Key Not in
Group
Key
In Group Key In Group Key in Group Key Not in Group
Key
table
raw
_measurement _field _value generatorID topic host _time
0 genData temperature 190.0 generator1 emergency_gen
erator/generator
1
influxdata-roads
how
rfc3339time1
0 genData temperature 195.0 generator1 emergency_gen
erator/generator
1
influxdata-roads
how
rfc3339time2
1 genData temperature 200.0 generator2 emergency_gen
erator/generator
2
influxdata-roads
how
rfc3339time1
1 genData temperature 210.0 generator2 emergency_gen
erator/generator
2
influxdata-roads
how
rfc3339time2
32. Not in
Group Key
In Group Key In Group Key Not in
Group Key
Not in Group
Key
In Group Key in Group Key Not in Group
Key
table
pivoted
_measurement _field generator1 generator2 topic host _time
0 genData temperature 190.0 200.0 emergency_gene
rator/generator1
influxdata-roadsh
ow
rfc3339time1
0 genData temperature 195.0 210.0 emergency_gene
rator/generator1
influxdata-roadsh
ow
rfc3339time2
33. map
difference = pivot
|> map(fn: (r) => ({ r with difference: r.generator1 -
r.generator2 }))
|> yield(name: "difference")
34. Not in
Group Key
In Group Key In Group Key Not in
Group Key
Not in Group
Key
Not in Group
Key
In Group Key in Group Key Not in Group
Key
table
pivoted
_measurement _field generator1 generator2 difference topic host _time
0 genData temperature 190.0 200.0 -10.0 emergency_ge
nerator/generat
or1
influxdata-road
show
rfc3339time1
0 genData temperature 195.0 210.0 -15.0 emergency_ge
nerator/generat
or1
influxdata-road
show
rfc3339time2
//use the math.abs() function to get the absolute difference
|> map(fn: (r) => ({ r with difference: math.abs(x: r.generator1 -
r.generator2 )}))