Successfully reported this slideshow.

Barbara Nelson [InfluxData] | Best Practices for Data Ingestion into InfluxDB | InfluxDays Virtual Experience London 2020

1

Share

1 of 27
1 of 27

Barbara Nelson [InfluxData] | Best Practices for Data Ingestion into InfluxDB | InfluxDays Virtual Experience London 2020

1

Share

Download to read offline

There are many ways to get your time series data into InfluxDB. You can use Telegraf plugins, a client API, or use Flux to upload your data. This talk will cover the pros and cons of each approach. We will then dive into Telegraf (our most flexible approach to loading data into InfluxDB, supporting over 200 plugins today). We will show you some great examples of how our customers are using Telegraf to retrieve, process and output time series data. If you are relatively new to Telegraf, you will come away with a better understanding of how to leverage Telegraf in your environment to seamlessly upload your data to InfluxDB. If you’ve been using Telegraf for a while, you may discover some new Telegraf capabilities that you weren’t aware of. You may even get inspired to contribute to Telegraf to make it even better.

There are many ways to get your time series data into InfluxDB. You can use Telegraf plugins, a client API, or use Flux to upload your data. This talk will cover the pros and cons of each approach. We will then dive into Telegraf (our most flexible approach to loading data into InfluxDB, supporting over 200 plugins today). We will show you some great examples of how our customers are using Telegraf to retrieve, process and output time series data. If you are relatively new to Telegraf, you will come away with a better understanding of how to leverage Telegraf in your environment to seamlessly upload your data to InfluxDB. If you’ve been using Telegraf for a while, you may discover some new Telegraf capabilities that you weren’t aware of. You may even get inspired to contribute to Telegraf to make it even better.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Barbara Nelson [InfluxData] | Best Practices for Data Ingestion into InfluxDB | InfluxDays Virtual Experience London 2020

  1. 1. Barbara Nelson Data Ingestion into InfluxDB
  2. 2. Why is data ingestion so important?
  3. 3. Users get the most value when data can be processed Data in its raw form has limited value. Analyzing data transforms it into knowledge. The less time a user spends getting data into InfluxDB, the more time they can spend on acquiring knowledge from the data. Cloud 2 user survey: NPS doubled for users who had completed the data ingestion step and were able to start deriving value from their data.
  4. 4. Data Ingestion Approaches
  5. 5. Agent-based Push (Telegraf) InfluxDBTelegraf Outpu t plug- in Input plug- in Data Source
  6. 6. Telegraf • 200+ Telegraf plugins • Input • Output • Processor • Aggregator • Regular cadence of releases (quarterly feature release, monthly bug fix release) • Highly configurable - No coding required • Large community of contributors
  7. 7. Data Source Client API InfluxDB Client API
  8. 8. Client API • 9 Libraries: • Python • C# • Java • GO • Javascript/Node.js • Ruby • PHP • Scala • Kotlin • Handles batching, chunking, setting headers, etc. • Best approach when building custom applications
  9. 9. Agentless Pull (Scrapers) InfluxDBFlux .fro m Data Source
  10. 10. Agentless-pull (Scrapers) • Prometheus scraper (OSS only) • Flux *.from() • Doesn’t require agent
  11. 11. Data Source Native InfluxDBLine protocol
  12. 12. Native/Ecosystem • Source system speaks line protocol • Examples: JMeter, NiFi, Vector, FluentD • Influx CLI CSV Import • Quick and easy integration
  13. 13. Telegraf
  14. 14. Telegraf • Very popular open source project for collecting metrics from wide variety of data sources, writing to wide variety of data sinks • Database: Connect to data sources like MongoDB, MySQL, Redis, and others to collect and send metrics. • Systems: Collect metrics from cloud platforms, containers, and orchestrators. • IoT sensors: Collect critical stateful data (pressure levels, temp levels, etc.) from IoT sensors and devices. • Source at https://github.com/influxdata/telegraf
  15. 15. Telegraf • Plugin-based architecture • Input – e.g. system, docker, kafka_consumer, Prometheus, Kubernetes, snmp, influxdb_listener (> 170 input plugins) • Processor – e.g. regex, dedup • Aggregator – e.g minmax • Output – e.g. influxDB, file, kafka_producer, http (> 30 output plugins) • Telegraf will buffer data (up to configurable memory limit) • Telegraf can batch data
  16. 16. Adding your own plugin to Telegraf Telegraf is a single GO executable - all plugins need to be closely reviewed to make sure they co-exist well within the Telegraf agent 1. Sign the CLA 2. Create an issue to describe your plugin 3. Submit a PR for your new plugin (following the guidelines) 4. Respond to review feedback, update your PR 5. Plugin will be added to a future Telegraf release (once review is complete)
  17. 17. New in Telegraf 1.15 – more lightweight extensions to Telegraf agent
  18. 18. External plugin – via ExecD plugin • Plugin runs in its own process • Avoid the need for review by Telegraf team • Supports the same API as an internal plugin • Can use for non-GO plugins • Can use for licensed software plugins • Can use for any type of plugin (input, output, processor, aggregator)
  19. 19. External plugin architecture
  20. 20. Sample ExecD Plugin configuration [[inputs.execd]] command = ["telegraf-smartctl", "-d", "/dev/sda"] signal = "none" restart_delay = "10s" data_format = "influx"
  21. 21. Starlark - lightweight processor plugin • Starlark (formerly known as Skylark) intended for use as a configuration language • Starlark is a dialect of Python • Write your script and Starlark plugin will execute it • Execution cannot access file system, network, system resources
  22. 22. Sample Starlark Plugin configuration [[processors.starlark]] source = ''' def apply(metric): for k, v in metric.fields.items(): if type(v) == "float": metric.fields[k] = v * 10 return metric '''
  23. 23. Flux .from()
  24. 24. Flux.from(): getting data from multiple sources • influxdb.from() • csv.from() • sql.from() • socket.from() • prometheus.scrape() • http.get() • bigtable.from() (experimental)
  25. 25. Telegraf Flux or Telegraf? It depends. InfluxDB Flux. from Telegraf Telegraf Telegraf Telegraf Data sources Data at the edge Data at the edge Data at the edge Data at the edge Data at the edge
  26. 26. In summary Use any combination of: • Telegraf plugins • Client APIs • Native generation of line protocol • Flux.from()
  27. 27. Thank you.

Editor's Notes

  • Data can gain value by being correlated to other data – e.g. a normal temperature is in the range of 80-90 degrees. Anything above 95 degrees is unusual and should be acted upon.
  • Not necessarily InfluxDB on the right – could be any output plugin. InfluxDB is focus of my talk.
  • Mature OSS project, so many different input and output plugins suitable for many use cases. If there is a Telegraf plugin for your use case, you should use it as will be the most efficient way to get your data loaded.
  • Flux is our new functional query language. If you missed it, we had a two day Flux workshop last week, as part of InfluxDays. Most suitable for cloud-based data sources, where you can extract the data directly, and don’t need to run an agent.
  • Scrapes directly from the InfluxDB process where flux is running. Will cover in more detail later on.
  • The InfluxDB line protocol is a text based format for writing points to the database.
  • Huge variety of use cases among our customers – one customer is using the system and sensor plugins to monitor thousands of IOT devices. Another customer is tracking COVID-19 data in South America, using the exec input plugin to pull data from public websites, and then parse the JSON to extract the relevant fields.
  • You can mix and match any combination – multiple input, multiple output etc. Very flexible to scale up as your data needs grow.
  • This has been the approach for extending Telegraf over the last few years. The process works well, but one disadvantage is that the Telegraf agent keeps on increasing in size, as we add more plugins. The review process can also be quite laborious, and is challenging for some of the newer plugins, where we may not be very familiar with the protocols, or may not have access to a test environment.
  • New approach in 1.15. Won’t be suitable for every plugin, but we believe it can be used for at least some plugins. We already have one example, a twitter plugin, that is its own process. We could also see plugins that start as external processes and then get incorporated into Telegraf if they become popular. We might also take some of the older plugins and externalize them, to reduce the size of the Telegraf agent. It’s early days and we haven’t decided on any of these. For now, we’re just enabling a different configuration for plugins.
  • Another new feature in 1.15 is to enable Telegraf users to script processor plugins in Starlark.
  • Flux is a data scripting and query language. The ability to integrate with other systems is a core design feature of Flux. You can disparate data sources, including databases, third-party APIs or filesystems - anywhere data lives. More data sources will be added to the language over time – feel free to contribute your own.
  • The InfluxDB line protocol is a text based format for writing points to the database.
  • ×