Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

It takes two to tango! : Is SQL-on-Hadoop the next big step?


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

It takes two to tango! : Is SQL-on-Hadoop the next big step?

  1. 1. It takes two to tango! Is SQL-on-Hadoop the next big step?
  2. 2. Big Data Crunching A Retrospective
  3. 3. Three Phases
  4. 4. What was it like before Hadoop?ThePhylogeneticTreeofElephants
  5. 5. Partitioned or Sharded RDBMSsData WarehousesMassively Parallel DatabasesTech before Hadoop
  6. 6. Massively Parallel DatabasesShared Nothing Architecture
  7. 7. Hadoop - Early days
  8. 8. Acceptance Life CycleAcceptanceExplorationResistance
  9. 9. Complementary over Competitive
  10. 10. Split by Structure
  11. 11. What’s the best way to answer questions that span thesetwo worlds?Can we interface SQL atop Hadoop?Can we combine the strengths of parallel databases withthose of Hadoop?
  12. 12. SQL-on-Hadoop : Technology
  13. 13. Distributed Query ProcessingCloudera’s ImpalaMapR supported Apache Drill and more..Split Query ProcessingMicrosoft PolybaseHadaptSQL-on-Hadoop : Technical ApproachesFaster HiveHortonworks’ Stinger initiativeQubole’s Hive-on-the-Cloud
  14. 14. Distributed Query Processing
  15. 15. Cloudera Impala : ArchitectureClientsImpala Shell JDBC/ODBC Client SQL ToolsData Node Data NodeImpala Daemon Impala Daemon Impala DaemonData NodeQuery ExecutionQuery PlanningQuery CoordinationQuery ExecutionQuery PlanningQuery CoordinationQuery ExecutionQuery PlanningQuery CoordinationState StoreMetadata Catalog HDFS Name NodeUnified Metadata Store
  16. 16. Life Cycle of an Impala QueryClientsImpala Shell JDBC/ODBC Client SQL ToolsImpala DaemonData NodeState StoreMetadata Catalog HDFS Name NodeImpala DaemonData NodeImpala DaemonData NodeImpala DaemonData NodeCoordinate ExecutionPlan and OptimizeParse Query
  17. 17. Split Query Processing
  18. 18. Polybase + PDW : ArchitectureClientsADO.NET JDBC/ODBC Client OLEDBPDW Engine Service DMS Controller Loader Manager SQL ServerHDFS BridgeCompute NodeData Move ServiceSQL ServerJob TrackerHadoop ClusterName NodeData NodeTask TrackerData NodeTask TrackerData NodeTask TrackerPDW ClusterSQL ServerCompute NodeData Move ServiceHDFS BridgeCompute NodeData Move ServiceSQL ServerSQL ServerCompute NodeData Move ServiceSQL Server PDW : ArchitectureControl Node
  19. 19. CREATE HADOOP_CLUSTER GSL_CLUSTER WITH(namenode=‘hadoop-head’,namenode_port=9000,jobtracker=‘hadoop-head’,jobtracker_port=9010);Register the Hadoop Cluster with PDW
  20. 20. Map HDFS File to External Tables in PDWCREATE EXTERNAL TABLE hdfsCustomer( c_custkey!! bigint not null,c_name!! varchar(25) not null,c_address!! varchar(40) not null,c_nationkey! integer not null,c_phone! ! char(15) not null,c_acctbal!! decimal(15,2) not null,c_mktsegment! char(10) not null,c_comment!! varchar(117) not null)WITH (LOCATION=/tpch1gb/customer.tbl,FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER,EXTERNAL_FILEFORMAT = TEXT_FORMAT));
  21. 21. Life Cycle of a Split QueryClientsADO.NET JDBC/ODBC Client OLEDBLoader ManagerControl NodeDMS ControllerEngine Service SQL ServerHDFS BridgeCompute NodeData Move ServiceSQL ServerHadoop ClusterData NodeTask TrackerData NodeTask TrackerData NodeTask TrackerPDW ClusterHDFS BridgeCompute NodeData Move ServiceSQL ServerPlanJob TrackerName NodeData NodeTask Tracker
  22. 22. SQL-on-Hadoop : The TechnologyFaster HiveDistributed Query ProcessorsSplit Query Processors
  23. 23. SQL-on-Hadoop or Map Reduce?
  24. 24. </presentation>More onwww.systemswemake.comFollow : @systems_we_make