Presto overview


Published on

Published in: Technology, Spiritual
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A SubPlan will to convert by LocalExecutionPlanner to LocalExecutionPlan which has a operator sequence.
  • HashJoinOperator andHashBuilderOperator is connected by SourceHash which contains the output of HashBuilderOperator.
  • You can image Slice is a byte array. The Slice size is the array size. The Block size is the Slice size. The Page size is sum of all the Block sizes.
  • Every Split is only allowed to execute 1s by default. When the time is up, the split will be put back to the queue.
  • RecordSetDataStreamProvider is a subclass of ConnectorDataStreamProvider.
  • When DiscoveryNodeManager receives any Node information query, it will check if the cache is expired (5 seconds).If so, it will ask the ServiceSelectorto fetch the active nodes and drop the failure nodes.ServiceSelector will fetch the new node list from the Discovery Server every 10s by default.There is a thread in HeartbeatFailureDetector which will send the heartbeat to every active node 500ms by default.
  • Presto overview

    1. 1. Presto Overview Shixiong Zhu
    2. 2. Overview Register Ask active nodes Discovery Server
    3. 3. Coordinator SQL SQL QueryInfo SQLQueryManager QueryResults NextUri CLI SQLQueryExecution StatementResource QueryStarter … HttpRemoteTask Fetch Data QueryResults Coordinator Partial Data OutputReceiver Worker
    4. 4. SubPlan ExchangeNode AggregationNode(FINAL) Plan TableScanNode OutputNode FilterNode SubPlan AggregationNode TableScanNode FilterNode OutputNode AggregationNode(PARTIAL) SinkNode
    5. 5. SubPlan T: TableScanNode A: AggregationNode E: ExchangeNode E E A(FINAL) A(FINAL) Plan T JoinNode T OutputNode A A SubPlan SubPlan T JoinNode T A(PARTIAL) A(PARTIAL) SinkNode SinkNode OutputNode
    6. 6. Stage Task Worker Results Stage Stage Worker Coordinator Worker Worker Worker
    7. 7. Worker
    8. 8. LocalExecutionPlan SubPlan Node1 Op1 Node2 Op2 Node3 Op3 … … Node3 Opn
    9. 9. LocalExecutionPlan SubPlan Node1 Node2 LocalExecutionPlan Op1 Op2 SourceHash JoinNode HashJoinOperator Node3 Op3 HashBuilderOperator
    10. 10. Page(max page size: 1MB, max rows: 16 * 1024 ) Row Block Slice A byte array Block Block Block Block
    11. 11. Split Split Split Split Split Split Split Is the data ready? Register a callback N When the data of this Split is ready, put the Split back. Y Fetch one Page Execute Operator Y Has next Operator? N N TaskExecutor Is the Split done? Thread number = core nubmer * 4 Y Y N Time's up?
    12. 12. Execution Operators Op1 page = op1.getOutput op2.addInput(page) Op2 page = op2.getOutput op3.addInput(page) Op3 … Opn
    13. 13. Input TableScanOperator HiveSplit DataStreamManager RecordSetDataS treamProvider RecordProjectOperator ConnectorData StreamProvider HiveRecordSet HiveClient ConnectorData StreamProvider
    14. 14. HiveSplit InputFormat RecordReader HiveRecordSet Lines TableScanOperator RecordProjectOperator Page Next Operator
    15. 15. Load Balance NodeMap Split Map: Rack -> Nodes NodeSelector NodeScheduler Map: Host -> Nodes Map: Host:Port -> Nodes Node
    16. 16. NodeSelector.selectNode • Select acceptable nodes (as least 10 nodes by default) – Nodes has the same address – If not enough, add nodes in the same rack – If not enough, randomly select nodes in other racks • Select the node with the smallest number of assignments (pending tasks)
    17. 17. Output • Only has SELETE statement – Currently query results are streamed to the client
    18. 18. Communication • Protocol: HTTP • Data Format: JSON • Every instance has one server and one client
    19. 19. Q&A