Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Influx/Days 2017 San Francisco | Paul Dix

153 views

Published on

IFQL: INFLUX FUNCTIONAL QUERY LANGUAGE – BRINGING DATA SCIENCE AND ANALYTICS INTO THE DATABASE
Introduction of the new query language that we’re building into InfluxDB and the InfluxData platform. Its design is functional and heavily inspired by projects like Pandas in Python and the Tidyverse projects in R. In addition to providing more complex query functionality, the language will facilitate more analytics and data science workloads within the database. Clustering on time series matrices, similarity metrics and k-nearest neighbors, forecasting models, and other data science tasks may become simple query operators within the query language. This talk will introduce the data model, some of the functions and walk through a working prototype implementation that showcases functionality unavailable in the language.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Influx/Days 2017 San Francisco | Paul Dix

  1. 1. IFQL - Influx Query Language Paul Dix @pauldix paul@influxdb.com
  2. 2. Photo by Emily Morter on Unsplash
  3. 3. A new query language
  4. 4. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL
  5. 5. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() SELECT max(usage_system) FROM "foo".."cpu" WHERE "service" = 'app-server' AND time > now() - 12h GROUP BY time(10m) IFQL InfluxQL (1.x)
  6. 6. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Functions
  7. 7. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Function chaining, like jQuery or D3
  8. 8. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Named arguments
  9. 9. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Expression language for predicates
  10. 10. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Comments
  11. 11. // get the min, max, mean cpu utilization for all // application servers var s = select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) s.max() s.min() s.mean() IFQL Variables
  12. 12. A new execution engine
  13. 13. Processing
  14. 14. Storage
  15. 15. Shared nothing
  16. 16. Biggest API Advance Since 0.9
  17. 17. It works with InfluxDB 1.4 released today!
  18. 18. Photo by Ken Treloar on Unsplash
  19. 19. Features, Functionality, Flexibility
  20. 20. Need to dramatically increase feature velocity
  21. 21. Unify InfluxDB & Kapacitor
  22. 22. InfluxDB’s InfluxQL SELECT max(usage_system) FROM "cpu" WHERE "service" = 'app-server' AND time > now() - 12h GROUP BY time(10m)
  23. 23. Kapacitor’s TICKscript stream |from() .database('telegraf') .measurement('cpu') .groupBy(*) |window() .period(5m) .every(5m) .align() |mean('usage_idle') .as('usage_idle') |influxDBOut() .database('telegraf') .retentionPolicy('autogen') .measurement('mean_cpu_idle') .precision('s')
  24. 24. IFQL: learn one language, use both
  25. 25. Queries: interactive, background batch, streaming far easier to develop & debug Kapacitor tasks!
  26. 26. One engine, regardless of context
  27. 27. Project Goals Photo by Glen Carrie on Unsplash
  28. 28. Familiar
  29. 29. JavaScript, it u? // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max()
  30. 30. Easy to read & understand
  31. 31. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Named arguments
  32. 32. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Expression language for predicates
  33. 33. Flexible & Extensible
  34. 34. // get the cpu max utilization for all // application servers select(db:"foo") .where(exp:{"_measurement"=="cpu" AND "_field"=="usage_system" AND "service"=="app-server"}) .range(start:-12h) .window(every:10m) .max() IFQL Introduction of new arguments in future versions won’t break previous users
  35. 35. Make it easy to add functions like plugins in Telegraf
  36. 36. package functions import ( "fmt" "github.com/influxdata/ifql/ifql" "github.com/influxdata/ifql/query" "github.com/influxdata/ifql/query/execute" "github.com/influxdata/ifql/query/plan" ) const CountKind = "count" type CountOpSpec struct { } func init() { ifql.RegisterFunction(CountKind, createCountOpSpec) query.RegisterOpSpec(CountKind, newCountOp) plan.RegisterProcedureSpec(CountKind, newCountProcedure, CountKind) execute.RegisterTransformation(CountKind, createCountTransformation) } func createCountOpSpec(args map[string]ifql.Value, ctx ifql.Context) (query.OperationSpec, error) { if len(args) != 0 { return nil, fmt.Errorf(`count function requires no arguments`) } return new(CountOpSpec), nil } func newCountOp() query.OperationSpec { return new(CountOpSpec) } func (s *CountOpSpec) Kind() query.OperationKind { return CountKind }
  37. 37. type CountProcedureSpec struct { } func newCountProcedure(query.OperationSpec) (plan.ProcedureSpec, error) { return new(CountProcedureSpec), nil } func (s *CountProcedureSpec) Kind() plan.ProcedureKind { return CountKind } func (s *CountProcedureSpec) Copy() plan.ProcedureSpec { return new(CountProcedureSpec) } func (s *CountProcedureSpec) PushDownRule() plan.PushDownRule { return plan.PushDownRule{ Root: SelectKind, Through: nil, } } func (s *CountProcedureSpec) PushDown(root *plan.Procedure, dup func() *plan.Procedure) { selectSpec := root.Spec.(*SelectProcedureSpec) if selectSpec.AggregateSet { root = dup() selectSpec = root.Spec.(*SelectProcedureSpec) selectSpec.AggregateSet = false selectSpec.AggregateType = "" return } selectSpec.AggregateSet = true selectSpec.AggregateType = CountKind }
  38. 38. type CountAgg struct { count int64 } func createCountTransformation(id execute.DatasetID, mode execute.AccumulationMode, spec plan.ProcedureSpec, ctx execute.Context (execute.Transformation, execute.Dataset, error) { t, d := execute.NewAggregateTransformationAndDataset(id, mode, ctx.Bounds(), new(CountAgg)) return t, d, nil } func (a *CountAgg) DoBool(vs []bool) { a.count += int64(len(vs)) } func (a *CountAgg) DoUInt(vs []uint64) { a.count += int64(len(vs)) } func (a *CountAgg) DoInt(vs []int64) { a.count += int64(len(vs)) } func (a *CountAgg) DoFloat(vs []float64) { a.count += int64(len(vs)) } func (a *CountAgg) DoString(vs []string) { a.count += int64(len(vs)) } func (a *CountAgg) Type() execute.DataType { return execute.TInt } func (a *CountAgg) ValueInt() int64 { return a.count }
  39. 39. Decouple storage from compute
  40. 40. Iterate & deploy more frequently
  41. 41. Scale independently
  42. 42. Workload Isolation

×