Successfully reported this slideshow.
Your SlideShare is downloading. ×

Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux Beginners | InfluxDays Virtual Experience NA 2020

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 47 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux Beginners | InfluxDays Virtual Experience NA 2020 (20)

Advertisement

More from InfluxData (20)

Recently uploaded (20)

Advertisement

Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux Beginners | InfluxDays Virtual Experience NA 2020

  1. 1. Faith Chikwekwe, InfluxData Anais Dotis-Georgiou, InfluxData Top 10 Hurdles for Flux Beginners
  2. 2. Introduction and Overview Flux
  3. 3. © 2020 InfluxData. All rights reserved. 3 Introduction to Flux ● Functional query and scripting language ● JavaScript-esque ● Open source ● Written in Go and Flux itself from(bucket:"example-bucket") |> range(start:-1h) |> filter(fn:(r) => r._measurement == "cpu" and r.cpu == "cpu-total" )
  4. 4. © 2020 InfluxData. All rights reserved. 4 Introduction to Flux ● Math across Measurements ● Custom Functions ● SPEC and syntax examples available in repo github.com/influxdata/flux/SPEC.md
  5. 5. 1.Overlooking UI tools which facilitate writing Flux
  6. 6. © 2020 InfluxData. All rights reserved. 6 Using the UI–Solution ● Using the Flux Script Editor ● Using the Raw Data View
  7. 7. © 2020 InfluxData. All rights reserved. 7 ● Using the Flux Script Editor ● Variables Using the Flux Builder
  8. 8. © 2020 InfluxData. All rights reserved. 8 ● Using the Flux Script Editor ● Variables Injecting Functions
  9. 9. © 2020 InfluxData. All rights reserved. 9 Table View vs Raw Data View
  10. 10. © 2020 InfluxData. All rights reserved. 10 Using the Raw Data View
  11. 11. 2. Folks misunderstand annotated CSVs and sometimes don’t write them correctly.
  12. 12. © 2020 InfluxData. All rights reserved. 12 ● array.from() ● to() ● Use github.com/influxdata/flux/stdlib/universe Annotated CSV Solutions
  13. 13. © 2020 InfluxData. All rights reserved. 13
  14. 14. © 2020 InfluxData. All rights reserved. 14 from(bucket: "cats-and-dogs") |> range(start: 2020-05-15T00:00:00Z, stop: 2020-05-16T00:00:00Z) |> filter(fn: (r) => r["_measurement"] == "cats") |> filter(fn: (r) => r["_field"] == "adult") |> filter(fn: (r) => r["shelter"] == "A") |> filter(fn: (r) => r["type"] == "calico") |> limit(n:2) #group,false,false,true,true,false,false,true,true,true,true #datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string #default,_result,,,,,,,,, ,result,table,_start,_stop,_time,_value,_field,_measurement,shelter,type ,,0,2020-05-15T00:00:00Z,2020-05-16T00:00:00Z,2020-05-15T18:50:33.262484Z,8,adult,cats,A,calico ,,0,2020-05-15T00:00:00Z,2020-05-16T00:00:00Z,2020-05-15T18:51:48.852085Z,7,adult,cats,A,calico
  15. 15. © 2020 InfluxData. All rights reserved. 15 Explaining Annotated CSV ● #group: A boolean that indicates the column is part of the group key. A group key is a list of columns for which every row in the table has the same value. ○ true columns: In our example query above, we’ve filtered by a single field type, adult, a single “shelter” tag, “A”, and a single “type” tag, “calico”. These values are constant across rows, so those columns are set to true. ○ false columns: The _time and _value columns have different values values across rows which is why they receive a false for the value of the #group Annotation. ● #datatype: Describes the type of data. ● #default: The value to use for rows with an empty value. So for example, if we had assigned our query to the variable ourQuery this annotation would look like: #default,ourQuery,,,,,,,,,
  16. 16. © 2020 InfluxData. All rights reserved. 16 import "experimental/array" array.from(rows: [{_measurement: "m0", mytag: "t0", _field: "f0", _value: "foo", _time: 2020-01- 01T00:00:00Z,}, {_measurement: "m0", mytag: "t0", _field: "f0", _value: "bar", _time: 2020-01- 02T00:00:00Z}]) |> to(bucket: “my-bucket”) Using array.from()
  17. 17. 3. Data layout design leads to runaway cardinality and slow Flux queries
  18. 18. © 2020 InfluxData. All rights reserved. 18 Influx Line Protocol
  19. 19. © 2020 InfluxData. All rights reserved. 19 General Recommendations ● Encode meta data in tags. Measurements and tags are indexed while field values are not. Commonly queried metadata should be stored in tags. ● Limit the number of series or try to reduce series cardinality. ● Keep bucket and measurement names short and simple. ● Avoid encoding data in measurement names. ● Separate data into buckets when you need to assign different retention policies to that data or require an authentication token.
  20. 20. © 2020 InfluxData. All rights reserved. 20 Common Mistakes Mistake: Making ids (such as eventid, orderid, or userid) a tag. This is another example that can cause unbounded cardinality if the tag values aren’t scoped. Solution: Instead, make these metrics a field.
  21. 21. © 2020 InfluxData. All rights reserved. 21 Schema and Flux ● Your schema affects your Flux query performance. ● Anticipate common queries and let that influence your schema design.
  22. 22. 4. Storing the wrong data in InfluxDB and under- utilizing Source Functions
  23. 23. © 2020 InfluxData. All rights reserved. 23 ● Use Source functions to pull relevant data into InfluxDB as needed ● Work with PostgreSQL, MySQL, Snowflake, SQLite, SQL Server, Athena, BigQuery Source Functions – Solution
  24. 24. © 2020 InfluxData. All rights reserved. 24 import "sql" sql.from( driverName: "postgres", dataSourceName: "postgresql://user:password@localhost", query: "SELECT * FROM example_table" ) Source Functions – Solution
  25. 25. 5. Confused about how to project multiple aggregations
  26. 26. © 2020 InfluxData. All rights reserved. 26 ● Create multiple queries with the UI and visualize them simultaneously ● Get “Fancy” with Flux to project multiple aggregations ○ join() ○ pivot() and union() ○ reduce() Multiple Aggregation Projections – Solutions
  27. 27. © 2020 InfluxData. All rights reserved. 27 Multiple Aggregation with the UI
  28. 28. © 2020 InfluxData. All rights reserved. 28 data = from(bucket: "my-bucket") |> range(start: 2019-12-31T00:00:00Z, stop: 2020-01-04T00:00:00Z) |> filter(fn: (r) => r._measurement == "my_measurement") |> filter(fn: (r) => r._field == "temp") temp_mean = data |> mean() |> set(key: "agg_type", value: "mean") temp_count = data |> count() |> toFloat() |> set(key: "agg_type", value: "count") union(tables: [temp_mean, temp_count]) |> group(columns: ["agg_type"], mode:"by") |> yield() Union
  29. 29. © 2020 InfluxData. All rights reserved. 29 ● The join() function merges two or more input streams, whose values are equal on a set of common columns, into a single output stream. ● The reduce() function aggregates records in each table according to the reducer, fn, providing a way to create custom aggregations. Join and Reduce
  30. 30. 6. Common Flux Packages and Utility Functions Not knowing beginner tips to make writing Flux easier
  31. 31. © 2020 InfluxData. All rights reserved. 31 Common Flux Packages and Utility Functions aggregateWindow() Create your own utility function to aggregate data for your use case and pass it in as the `fn` param monitor package Write your own check and notification scripts for greater control over alerts `flux/docs/Alerts.md` map() Replace the value in a column (for example offsetting the `_value` column) using the with keyword I need more functionality for my use case Consider contributing a custom package to Flux flux/stdlib/contrib/README.md
  32. 32. © 2020 InfluxData. All rights reserved. 32 Custom Aggregate custom_agg = … from(bucket:"items") |> range(start:-1h) |> aggregateWindow(every: 1m, fn: custom_agg) Map from(bucket: “items”) |> filter(fn: (r) => r._measurement == "m") |> map(fn: (r) => ({r with _value: r._value * 2.0} ))
  33. 33. © 2020 InfluxData. All rights reserved. 33 Median Absolute Deviation with Flux
  34. 34. 7. Performance Gains Being unaware of awesome performance optimizations to make your queries faster in InfluxDB Cloud 2.0
  35. 35. © 2020 InfluxData. All rights reserved. 35 - Memory optimizations - More pushdowns - What’s a pushdown? - “Pushing down” the work of transforming the data to be done by the storage side - How does it help optimize my query? Performance Gains
  36. 36. © 2020 InfluxData. All rights reserved. 36 Query You have some data stored in InfluxDB Cloud 2.0. You write a query for that data. (e.g. from |> range |> filter |> group |> max) Plan The Flux planner will check if your query matches any existing pushdown patterns. Rewrite If it matches a pushdown pattern, then the planner will write the plan for how to perform your query. Execute The Flux executor will execute your query by invoking a hidden operation (e.g. ReadGroupMax). This will initiate a storage read API call. Result Storage will transform the data and stream the result back to Flux via gRPC. Flux converts the gRPC data back to Flux tables.
  37. 37. © 2020 InfluxData. All rights reserved. 37 Assuming your query starts with: - from() |> range() |> filter() [these get pushed down] We’re supporting the following in InfluxDB Cloud 2.0: - Bare aggregates - e.g.|> first() - Windowed aggregates - e.g.|> window() |> first - |> aggregateWindow(first) - Grouped aggregates - e.g. |> group() |> first() - |> group() |> window() |> first() Performance Gains
  38. 38. 8. Schema Mutations Not using schema mutators at the right time
  39. 39. © 2020 InfluxData. All rights reserved. 39 from(bucket: “itemBucket”) |> range(start: -1mo) |> filter(fn: (r) => (r._measurement == “rate”)) |> filter(fn: (r) => (r._field == “item1”)) |> mean() ... |> drop(columns: [“col1”, “col2”]) keep(), drop(), rename(), duplicate() Schema Mutations
  40. 40. 9. Tasks When to write your own tasks
  41. 41. © 2020 InfluxData. All rights reserved. 41 How to write a Task
  42. 42. © 2020 InfluxData. All rights reserved. 42 option task = {name: “rateTask”, every: 1h} rate = from(bucket: “items”) |> range(start: -task.every) |> filter(fn: (r) => (r._measurement == “totalItems”)) |> filter(fn: (r) => (r._field == “items”)) |> group(columns: [“itemGroupName”]) |> aggregateWindow(fn: sum, every: task.every) |> map(fn: (r) => { return: _time: r._time, _stop: r._stop, _start: r._start, _measurement: “rateTask”, _field: “fruits”, _value: r._value, itemGroupName: itemGroupName, } }) rate |> to(bucket: “newItems”) How to write a Task
  43. 43. © 2020 InfluxData. All rights reserved. 43 from(bucket: “newItems”) |> range(start: -1mo) |> filter(fn: (r) => (r._measurement == “rateTask”)) |> filter(fn: (r) => (r._field == “items”)) Output Task Data to a Dashboard Examples for how to write a check and notification script are in flux/doc/Alerts.md. You can even write your own alerts
  44. 44. 10. Downsampling Tasks Not knowing when/why to downsample data
  45. 45. © 2020 InfluxData. All rights reserved. 45 options task = {name: “foo”, every: 1m} from(bucket: “items”) |> range(start: -task.every) |> filter(fn: r._measurement == "m" and r._field == "f") |> group(columns: ["_measurement", "_field", "_start", "_stop"]) |> aggregateWindow(fn: last, every: task.every) |> to(bucket: "downsample_bucket") Why Downsample?
  46. 46. © 2020 InfluxData. All rights reserved. 46 option task = {name: "Downsampling CPU", every: 1m} data = from(bucket: "my-bucket") |> range(start: -task.every) |> filter(fn: (r) => r._measurement == "my_measurement") data |> mean() |> set(key: "agg_type",value: "mean_temp") |> to(bucket: "downsampled", org: "my-org", tagColumns: ["agg_type"]) data |> count() |> set(key: "agg_type",value: “count_temp") |> to(bucket: "downsampled", org: "my-org", tagColumns: ["agg_type"]) Why Downsample?
  47. 47. © 2020 InfluxData. All rights reserved. 47 Thanks!

×