It takes two to tango! Is SQL-on-Hadoop the next big step?
Big Data Crunching A Retrospective
Three Phases
What was it like before Hadoop?ThePhylogeneticTreeofElephants
Partitioned or Sharded RDBMSsData WarehousesMassively Parallel DatabasesTech before Hadoop
Massively Parallel DatabasesShared Nothing Architecture
Hadoop - Early days
Acceptance Life CycleAcceptanceExplorationResistance
Complementary over Competitive
Split by Structure
What’s the best way to answer questions that span thesetwo worlds?Can we interface SQL atop Hadoop?Can we combine the stre...
SQL-on-Hadoop : Technology
Distributed Query ProcessingCloudera’s ImpalaMapR supported Apache Drill and more..Split Query ProcessingMicrosoft Polybas...
Distributed Query Processing
Cloudera Impala : ArchitectureClientsImpala Shell JDBC/ODBC Client SQL ToolsData Node Data NodeImpala Daemon Impala Daemon...
Life Cycle of an Impala QueryClientsImpala Shell JDBC/ODBC Client SQL ToolsImpala DaemonData NodeState StoreMetadata Catal...
Split Query Processing
Polybase + PDW : ArchitectureClientsADO.NET JDBC/ODBC Client OLEDBPDW Engine Service DMS Controller Loader Manager SQL Ser...
CREATE HADOOP_CLUSTER GSL_CLUSTER WITH(namenode=‘hadoop-head’,namenode_port=9000,jobtracker=‘hadoop-head’,jobtracker_port=...
Map HDFS File to External Tables in PDWCREATE EXTERNAL TABLE hdfsCustomer( c_custkey!! bigint not null,c_name!! varchar(25...
Life Cycle of a Split QueryClientsADO.NET JDBC/ODBC Client OLEDBLoader ManagerControl NodeDMS ControllerEngine Service SQL...
SQL-on-Hadoop : The TechnologyFaster HiveDistributed Query ProcessorsSplit Query Processors
SQL-on-Hadoop or Map Reduce?
</presentation>More onwww.systemswemake.comFollow : @systems_we_make
It takes two to tango! : Is SQL-on-Hadoop the next big step?
Upcoming SlideShare
Loading in …5
×

It takes two to tango! : Is SQL-on-Hadoop the next big step?

1,048 views
859 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,048
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

It takes two to tango! : Is SQL-on-Hadoop the next big step?

  1. 1. It takes two to tango! Is SQL-on-Hadoop the next big step?
  2. 2. Big Data Crunching A Retrospective
  3. 3. Three Phases
  4. 4. What was it like before Hadoop?ThePhylogeneticTreeofElephants
  5. 5. Partitioned or Sharded RDBMSsData WarehousesMassively Parallel DatabasesTech before Hadoop
  6. 6. Massively Parallel DatabasesShared Nothing Architecture
  7. 7. Hadoop - Early days
  8. 8. Acceptance Life CycleAcceptanceExplorationResistance
  9. 9. Complementary over Competitive
  10. 10. Split by Structure
  11. 11. What’s the best way to answer questions that span thesetwo worlds?Can we interface SQL atop Hadoop?Can we combine the strengths of parallel databases withthose of Hadoop?
  12. 12. SQL-on-Hadoop : Technology
  13. 13. Distributed Query ProcessingCloudera’s ImpalaMapR supported Apache Drill and more..Split Query ProcessingMicrosoft PolybaseHadaptSQL-on-Hadoop : Technical ApproachesFaster HiveHortonworks’ Stinger initiativeQubole’s Hive-on-the-Cloud
  14. 14. Distributed Query Processing
  15. 15. Cloudera Impala : ArchitectureClientsImpala Shell JDBC/ODBC Client SQL ToolsData Node Data NodeImpala Daemon Impala Daemon Impala DaemonData NodeQuery ExecutionQuery PlanningQuery CoordinationQuery ExecutionQuery PlanningQuery CoordinationQuery ExecutionQuery PlanningQuery CoordinationState StoreMetadata Catalog HDFS Name NodeUnified Metadata Store
  16. 16. Life Cycle of an Impala QueryClientsImpala Shell JDBC/ODBC Client SQL ToolsImpala DaemonData NodeState StoreMetadata Catalog HDFS Name NodeImpala DaemonData NodeImpala DaemonData NodeImpala DaemonData NodeCoordinate ExecutionPlan and OptimizeParse Query
  17. 17. Split Query Processing
  18. 18. Polybase + PDW : ArchitectureClientsADO.NET JDBC/ODBC Client OLEDBPDW Engine Service DMS Controller Loader Manager SQL ServerHDFS BridgeCompute NodeData Move ServiceSQL ServerJob TrackerHadoop ClusterName NodeData NodeTask TrackerData NodeTask TrackerData NodeTask TrackerPDW ClusterSQL ServerCompute NodeData Move ServiceHDFS BridgeCompute NodeData Move ServiceSQL ServerSQL ServerCompute NodeData Move ServiceSQL Server PDW : ArchitectureControl Node
  19. 19. CREATE HADOOP_CLUSTER GSL_CLUSTER WITH(namenode=‘hadoop-head’,namenode_port=9000,jobtracker=‘hadoop-head’,jobtracker_port=9010);Register the Hadoop Cluster with PDW
  20. 20. Map HDFS File to External Tables in PDWCREATE EXTERNAL TABLE hdfsCustomer( c_custkey!! bigint not null,c_name!! varchar(25) not null,c_address!! varchar(40) not null,c_nationkey! integer not null,c_phone! ! char(15) not null,c_acctbal!! decimal(15,2) not null,c_mktsegment! char(10) not null,c_comment!! varchar(117) not null)WITH (LOCATION=/tpch1gb/customer.tbl,FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER,EXTERNAL_FILEFORMAT = TEXT_FORMAT));
  21. 21. Life Cycle of a Split QueryClientsADO.NET JDBC/ODBC Client OLEDBLoader ManagerControl NodeDMS ControllerEngine Service SQL ServerHDFS BridgeCompute NodeData Move ServiceSQL ServerHadoop ClusterData NodeTask TrackerData NodeTask TrackerData NodeTask TrackerPDW ClusterHDFS BridgeCompute NodeData Move ServiceSQL ServerPlanJob TrackerName NodeData NodeTask Tracker
  22. 22. SQL-on-Hadoop : The TechnologyFaster HiveDistributed Query ProcessorsSplit Query Processors
  23. 23. SQL-on-Hadoop or Map Reduce?
  24. 24. </presentation>More onwww.systemswemake.comFollow : @systems_we_make

×