Technology
  1. 1. Presto Overview Shixiong Zhu
  2. 2. Overview Register Ask active nodes Discovery Server
  3. 3. Coordinator SQL SQL QueryInfo SQLQueryManager QueryResults NextUri CLI SQLQueryExecution StatementResource QueryStarter … HttpRemoteTask Fetch Data QueryResults Coordinator Partial Data OutputReceiver Worker
  4. 4. SubPlan ExchangeNode AggregationNode(FINAL) Plan TableScanNode OutputNode FilterNode SubPlan AggregationNode TableScanNode FilterNode OutputNode AggregationNode(PARTIAL) SinkNode
  5. 5. SubPlan T: TableScanNode A: AggregationNode E: ExchangeNode E E A(FINAL) A(FINAL) Plan T JoinNode T OutputNode A A SubPlan SubPlan T JoinNode T A(PARTIAL) A(PARTIAL) SinkNode SinkNode OutputNode
  6. 6. Stage Task Worker Results Stage Stage Worker Coordinator Worker Worker Worker
  7. 7. Worker
  8. 8. LocalExecutionPlan SubPlan Node1 Op1 Node2 Op2 Node3 Op3 … … Node3 Opn
  9. 9. LocalExecutionPlan SubPlan Node1 Node2 LocalExecutionPlan Op1 Op2 SourceHash JoinNode HashJoinOperator Node3 Op3 HashBuilderOperator
  10. 10. Page(max page size: 1MB, max rows: 16 * 1024 ) Row Block Slice A byte array Block Block Block Block
  11. 11. Split Split Split Split Split Split Split Is the data ready? Register a callback N When the data of this Split is ready, put the Split back. Y Fetch one Page Execute Operator Y Has next Operator? N N TaskExecutor Is the Split done? Thread number = core nubmer * 4 Y Y N Time's up?
  12. 12. Execution Operators Op1 page = op1.getOutput op2.addInput(page) Op2 page = op2.getOutput op3.addInput(page) Op3 … Opn
  13. 13. Input TableScanOperator HiveSplit DataStreamManager RecordSetDataS treamProvider RecordProjectOperator ConnectorData StreamProvider HiveRecordSet HiveClient ConnectorData StreamProvider
  14. 14. HiveSplit InputFormat RecordReader HiveRecordSet Lines TableScanOperator RecordProjectOperator Page Next Operator
  15. 15. Load Balance NodeMap Split Map: Rack -> Nodes NodeSelector NodeScheduler Map: Host -> Nodes Map: Host:Port -> Nodes Node
  16. 16. NodeSelector.selectNode • Select acceptable nodes (as least 10 nodes by default) – Nodes has the same address – If not enough, add nodes in the same rack – If not enough, randomly select nodes in other racks • Select the node with the smallest number of assignments (pending tasks)
  17. 17. Output • Only has SELETE statement – Currently query results are streamed to the client
  18. 18. Communication • Protocol: HTTP • Data Format: JSON • Every instance has one server and one client
  19. 19. Q&A