Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

2021 10-13 i ox query processing Slide 1 2021 10-13 i ox query processing Slide 2 2021 10-13 i ox query processing Slide 3 2021 10-13 i ox query processing Slide 4 2021 10-13 i ox query processing Slide 5 2021 10-13 i ox query processing Slide 6 2021 10-13 i ox query processing Slide 7 2021 10-13 i ox query processing Slide 8 2021 10-13 i ox query processing Slide 9 2021 10-13 i ox query processing Slide 10 2021 10-13 i ox query processing Slide 11 2021 10-13 i ox query processing Slide 12 2021 10-13 i ox query processing Slide 13 2021 10-13 i ox query processing Slide 14 2021 10-13 i ox query processing Slide 15 2021 10-13 i ox query processing Slide 16 2021 10-13 i ox query processing Slide 17 2021 10-13 i ox query processing Slide 18 2021 10-13 i ox query processing Slide 19 2021 10-13 i ox query processing Slide 20 2021 10-13 i ox query processing Slide 21 2021 10-13 i ox query processing Slide 22 2021 10-13 i ox query processing Slide 23 2021 10-13 i ox query processing Slide 24 2021 10-13 i ox query processing Slide 25 2021 10-13 i ox query processing Slide 26 2021 10-13 i ox query processing Slide 27 2021 10-13 i ox query processing Slide 28 2021 10-13 i ox query processing Slide 29 2021 10-13 i ox query processing Slide 30
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

2021 10-13 i ox query processing

Download to read offline

InfluxData Tech Talk: Query Processing in InfluxDB IOx

https://www.youtube.com/watch?v=9DYkWuM8xco

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

2021 10-13 i ox query processing

  1. 1. Query Processing in InfluxDB IOx SQL, Storge gRPC, Reorganization 2021-10-13 CC BY-SA Andrew Lamb
  2. 2. © 2021 InfluxData. All rights reserved. 2 Today: IOx Team at InfluxData Past life 1: Query Optimizer @ Vertica, also on Oracle DB server Past life 2: Chief Architect + VP Engineering roles at some ML startups
  3. 3. © 2021 InfluxData. All rights reserved. 3 Talk Outline ‒ Data Model and Storage Review ‒ Query Processing Overview ‒ Frontends ‒ Execution Plans
  4. 4. © 2021 InfluxData. All rights reserved. 4 Data Layout and Storage
  5. 5. © 2021 InfluxData. All rights reserved. 5 Data Organization: Partitions Partitions define how data is kept in separate Chunks in storage. Each chunk logically stores part of a partition for one or more Relational Tables Partitioning is used for for 1. Data Lifecycle Management (drop whole partitions by deleting files) 2. Query Performance (partition pruning) Each row mapped to a single Partition based on Partition Rules cpu table disk table requests table Jan 1 Jan 2 Jan 3 Jan 1 Jan 3 Jan 1 Jan 2
  6. 6. © 2021 InfluxData. All rights reserved. 6 Data Organization: Chunks Chunk0 (closed) Chunk1 (closed) Chunk4 open Within each partition within a table, data is divided into physical chunks, identified with a chunk id and a chunk order. Chunks with lower order have older (by insert time) data. There is at most one open chunk for each partition. All new data (including deletes + updates) is written into the open chunk Once a chunk is closed, it becomes immutable: rows are never added/removed. The data is compacted / persisted over time into new chunks and the old chunk dropped Chunk2 (closed) Chunk3 (closed) New data is written into the open chunk Closed chunks are ordered by age of data, and never modified
  7. 7. © 2021 InfluxData. All rights reserved. 7 Data Model weather,location=us-east temperature=82,humidity=67 1465839830100400200 weather,location=us-midwest temperature=82,humidity=65 1465839830100400200 weather,location=us-west temperature=70,humidity=54 1465839830100400200 weather,location=us-east temperature=83,humidity=69 1465839830200400200 weather,location=us-midwest temperature=87,humidity=78 1465839830200400200 weather,location=us-west temperature=72,humidity=56 1465839830200400200 weather,location=us-east temperature=84,humidity=67 1465839830300400200 weather,location=us-midwest temperature=90,humidity=82 1465839830400400200 weather,location=us-west temperature=71,humidity=57 1465839830400400200 location "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" temperature 82 82 70 83 87 72 84 90 71 humidity 67 65 54 69 78 56 67 82 57 timestamp 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z
  8. 8. © 2021 InfluxData. All rights reserved. 8 Query Processing
  9. 9. © 2021 InfluxData. All rights reserved. 9 Design: One Execution Engine 1. Query and Data Reorganization*: two sides of the same coin “moving data around” 2. Reuse as much existing execution machinery (e.g. streaming, segregated worker pool, etc) 3. Amplify investment by leveraging Open Source (and contribute back) ⇒ All queries run through a unified planning system based on DataFusion + Arrow Execute as Rust async streams (`RecordBatchStream`) using tokio executor * Putting data in physical structures (Chunks)
  10. 10. © 2021 InfluxData. All rights reserved. 10 Query Processing IOx Storage gRPC Frontend SQL Frontend (from DataFusion) Optimization (storage pruning, pushdown, etc) Physical Planning Execution gRPC output Arrow Flight IPC Query Input Client / Language Specific Frontends Shared Planning, Execution Phases, based on DataFusion Client Specific Output formats read_group(. .) SELECT … FROM ... DataFusio n LogicalPla n Arrow Record Batches Reorg Frontend compact_plan(.. ) ReadBuffer / ParquetWriter SeriesFrame ... FlightData Write to ReadBufer or Parquet files DataFusio n LogicalPla n DataFusion Physical Plan
  11. 11. © 2021 InfluxData. All rights reserved. 11 IOx Query Optimization Features “Classic”: Projection/Filter/Limit pushdown, partial eval, ... Predicate Evaluation During Scan Chunk Pruning on predicates Parquet Row Group Pruning Grouping/Aggregate Pushdown Filters pushed down on some metadata queries, scans in ReadBuffer DataFusion IOx N/A ReadBuffer has support, but no query engine support
  12. 12. © 2021 InfluxData. All rights reserved. 12 Front Ends (Logical Plans)
  13. 13. © 2021 InfluxData. All rights reserved. 13 SQL Frontend Arrow Flight client IOx Port 8082 Object Store 2. IOx answers queries by combining data from parquet files + in memory cache 1. Flight request sent 3. Response streamed back via flight RPC See DataFusion: An Embeddable Query Engine Written in Rust for more details
  14. 14. © 2021 InfluxData. All rights reserved. 14 SQL: Logical Plan SELECT cpu, usage_user, time FROM cpu WHERE cpu = 'cpu1'; TableScan is accomplished via IOxReadFilterNode . Chunks are presented to as DataFusion “partitions” (different than IOx partitions) IOx query engine handles resolving upserts and deletes results Filter: #cpu Eq Utf8(“cpu1”) TableScan: cpu projection= Some([0,1,2,12]) Projection: #cpu, #usage_user, #time
  15. 15. © 2021 InfluxData. All rights reserved. 15 Reorg / Life Cycle Planner Chunk2 Chunk1 Chunk3 Compact Plan resolves upserts / applies deletes RecordBatch stream Compact lifecycle operation writes Stream to new Read Buffer (RUB) or Parquet chunk Chunk2 Chunk1 Chunk3 Split Plan: resolves upserts / applies deletes RecordBatch stream Persist lifecycle operation writes streams to Parquet chunk and RUB respectively RecordBatch stream time <= split_time time > split_time Compact Split (split_time)
  16. 16. © 2021 InfluxData. All rights reserved. 16 Reorg / Life Cycle Planner Compact Plan TableScan: cpu Chunks = ... Split Plan TableScan: cpu Chunks = ... (2 DataFusion partitions) StreamSplit split_time=1004 DataFusion extension Node
  17. 17. © 2021 InfluxData. All rights reserved. 17 Storage gRPC frontend: Flux and InfluxQL Flux Runtime InfluxQL IOx Port 8082 Object Store 2. IOx answers queries by combining data from parquet files + in memory cache 1. Flux/InfluxQL send requests via gRPC 3. Response streamed back via gRPC
  18. 18. © 2021 InfluxData. All rights reserved. 18 Storage gRPC Operations ‒ ReadFilter: Scan data out of IOx matching predicates ‒ ReadGroup: Groups/aggregates in IOx returning grouped data ‒ ReadWindowAggregate: Similar to ReadGroup, but windowed by time ‒ TagKeys: tag keys (column names) with data matching predicates. ‒ TagValues: distinct tag values (column values) with data matching predicates for a set of tag keys (columns). ‒ MeasurementNames: table names that satisfy some provided predicate. ‒ MeasurementTagKeys: Same as TagKeys but limited to a single table. ‒ MeasurementTagValues: Same as TagValues but limited to a single table. ‒ MeasurementFields: field names (column names) with rows matching predicate DataQuery Returns Time Series MetadataQuery Returns Strings / times (thanks @e-dard)
  19. 19. © 2021 InfluxData. All rights reserved. 19 Metadata Queries meta data queries are incredibly common and often done on more recent data ‒ measurement_names(range, predicate) ‒ tag_keys(range, predicate) ‒ tag_values(tag_key, range, predicate) Metadata Query Fast path for predicates ? * Read Buffer (RUB) in particular is heavily optimized for metadata queries and rarely need general purpose plans. YES: Answer with optimized* implementation in chunk. NO: Run general purpose (DataFusion) plan
  20. 20. © 2021 InfluxData. All rights reserved. 20 tag_keys (general) tag_keys pred: cpu ~= ‘.*total’ ts_range:[1000, 2000] results Filter: #cpu =~ ‘.*total’ AND 1000 < #time AND #time > 2000 TableScan: cpu projection= Some([0,1,2,12]) SchemaPivot DataFusion extension Node Produces a single output String column with the name of any input column that had a non null value
  21. 21. © 2021 InfluxData. All rights reserved. 21 Handling multiple tables tag_keys pred: host = ‘foo’ ts_range:[1000, 2000] Filter: 1000 < #time AND #time > 2000 AND host = ‘foo’ TableScan: cpu SchemaPivot SeriesSetPlan for cpu SeriesSetPlan(LogicalPlan) Filter: 1000 < #time AND #time > 2000 AND host = ‘foo’ TableScan: mem SchemaPivot SeriesSetPlan for mem {} SeriesSetPlan for host (no data between 1000 and 2000) Results from multiple plans and sets are combined at higher level
  22. 22. © 2021 InfluxData. All rights reserved. 22 read_filter pred: tag(_m)=”system” AND tag(_f)=”usage_user” AND tag(cpu)=”cpu1” ts_range: [1000, 2000) read_filter: Logical Plan IOx code creates DataFusion LogicalPlan nodes Filter: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) Sort: (#cpu, #host, #time) TableScan: system projection= Some([0,1,2,12]) Projection: #cpu, #host, #usage_user, #time TableScan is accomplished via same IOxReadFilterNode . Predicates are applied using a Filter Sort data in tag key order, as expected by Flux / InfluxQL
  23. 23. © 2021 InfluxData. All rights reserved. 23 read_filter: Physical Plan FilterExec: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) Sort: (#cpu, #host, #time) IOxReadFilterNode ProjectionExec: #cpu, #host, #usage_user, #time CoalescePartitionsExec CoalescePartitionsExec does not preserve sort order Added by DataFusion physical planning due to requirements from Sort FilterExec: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) IOxReadFilterNode FilterExec: 1000 < #time AND #time > 2000 AND #cpu Eq Utf8(“cpu1”) IOxReadFilterNode: …. Repeated for each chunk PartitionChunk (mutable_buffer) PartitionChunk (read_buffer) PartitionChunk (read_buffer) Calls PartitionChunk::read_filter During execution to get results
  24. 24. © 2021 InfluxData. All rights reserved. 24 read_group read_group pred: tag(_m)=”cpu” agg: first group_keys: “env” ts_range: [1000, 2000) Assumes env and cpu are tags results Filter: 1000 < #time AND #time > 2000 Sort: env, cpu, _time, _value TableScan: cpu projection= Some([0,1,2,12]) Projection: env, cpu, _time, _value GroupBy: gby: env, cpu agg: first.time(usage_user, time) as _time first.value(usage_user, time) as _value
  25. 25. © 2021 InfluxData. All rights reserved. 25 Execution Plans
  26. 26. © 2021 InfluxData. All rights reserved. 26 Table Scan: Reading data from a Chunk TableScan: cpu IOxReadFilterNode chunk_id = 1 LogicalPlan ExecutionPlan SendableRecordBatchStream SchemaAdapterStream {MUB,RUB,Parquet}Stream For a single chunk * Chunk that has no deletes or possible updates
  27. 27. © 2021 InfluxData. All rights reserved. 27 Schema Adapter Stream SchemaAdapterStream output_schema: {A, B, C} A C 1 10 2 20 3 30 4 40 Input RecordBatch A B C 1 NULL 10 2 NULL 20 3 NULL 30 4 NULL 40 Output RecordBatch Missing columns are padded with nulls
  28. 28. © 2021 InfluxData. All rights reserved. 28 Read Time Resolution of Updates/Upserts Chunks that potentially have updates (overlaps) to primary keys but different sort orders TableScan: cpu IOxReadFilterNode chunk_id = 1 LogicalPlan ExecutionPlan Simplified -- real plans handle partial overlap scenarios; See source documentation for more details IOxReadFilterNode chunk_id = 7 UnionExec SortPreservingMerge DeduplicateExec IOx extension that implements tag key deduplication SortExec(optional) Sort_key: tags SortExec(optional) Sort_key: tags May have to resort on primary key columns Classic N-way merge Doesn’t combine any DF partitions
  29. 29. © 2021 InfluxData. All rights reserved. 29 Read Time Resolution of Deletes IOxReadFilterNode chunk_id = 1 ExecutionPlan IOxReadFilterNode chunk_id = 7 UnionExec SortPreservingMerge DeduplicateExec SortExec(optional) Sort_key: tags SortExec(optional) Sort_key: tags Deletes can vary across chunks Any delete predicates are also applied in these scans (and thus pushed down to MUB, RUB, etc) as normal FilterExec time < 2021-10-01 Delete where time < 2021-10-01
  30. 30. Thank You

InfluxData Tech Talk: Query Processing in InfluxDB IOx https://www.youtube.com/watch?v=9DYkWuM8xco

Views

Total views

74

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×