It takes two to tango! : Is SQL-on-Hadoop the next big step?

Uploaded on


More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. It takes two to tango! Is SQL-on-Hadoop the next big step?
  • 2. Big Data Crunching A Retrospective
  • 3. Three Phases
  • 4. What was it like before Hadoop?ThePhylogeneticTreeofElephants
  • 5. Partitioned or Sharded RDBMSsData WarehousesMassively Parallel DatabasesTech before Hadoop
  • 6. Massively Parallel DatabasesShared Nothing Architecture
  • 7. Hadoop - Early days
  • 8. Acceptance Life CycleAcceptanceExplorationResistance
  • 9. Complementary over Competitive
  • 10. Split by Structure
  • 11. What’s the best way to answer questions that span thesetwo worlds?Can we interface SQL atop Hadoop?Can we combine the strengths of parallel databases withthose of Hadoop?
  • 12. SQL-on-Hadoop : Technology
  • 13. Distributed Query ProcessingCloudera’s ImpalaMapR supported Apache Drill and more..Split Query ProcessingMicrosoft PolybaseHadaptSQL-on-Hadoop : Technical ApproachesFaster HiveHortonworks’ Stinger initiativeQubole’s Hive-on-the-Cloud
  • 14. Distributed Query Processing
  • 15. Cloudera Impala : ArchitectureClientsImpala Shell JDBC/ODBC Client SQL ToolsData Node Data NodeImpala Daemon Impala Daemon Impala DaemonData NodeQuery ExecutionQuery PlanningQuery CoordinationQuery ExecutionQuery PlanningQuery CoordinationQuery ExecutionQuery PlanningQuery CoordinationState StoreMetadata Catalog HDFS Name NodeUnified Metadata Store
  • 16. Life Cycle of an Impala QueryClientsImpala Shell JDBC/ODBC Client SQL ToolsImpala DaemonData NodeState StoreMetadata Catalog HDFS Name NodeImpala DaemonData NodeImpala DaemonData NodeImpala DaemonData NodeCoordinate ExecutionPlan and OptimizeParse Query
  • 17. Split Query Processing
  • 18. Polybase + PDW : ArchitectureClientsADO.NET JDBC/ODBC Client OLEDBPDW Engine Service DMS Controller Loader Manager SQL ServerHDFS BridgeCompute NodeData Move ServiceSQL ServerJob TrackerHadoop ClusterName NodeData NodeTask TrackerData NodeTask TrackerData NodeTask TrackerPDW ClusterSQL ServerCompute NodeData Move ServiceHDFS BridgeCompute NodeData Move ServiceSQL ServerSQL ServerCompute NodeData Move ServiceSQL Server PDW : ArchitectureControl Node
  • 19. CREATE HADOOP_CLUSTER GSL_CLUSTER WITH(namenode=‘hadoop-head’,namenode_port=9000,jobtracker=‘hadoop-head’,jobtracker_port=9010);Register the Hadoop Cluster with PDW
  • 20. Map HDFS File to External Tables in PDWCREATE EXTERNAL TABLE hdfsCustomer( c_custkey!! bigint not null,c_name!! varchar(25) not null,c_address!! varchar(40) not null,c_nationkey! integer not null,c_phone! ! char(15) not null,c_acctbal!! decimal(15,2) not null,c_mktsegment! char(10) not null,c_comment!! varchar(117) not null)WITH (LOCATION=/tpch1gb/customer.tbl,FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER,EXTERNAL_FILEFORMAT = TEXT_FORMAT));
  • 21. Life Cycle of a Split QueryClientsADO.NET JDBC/ODBC Client OLEDBLoader ManagerControl NodeDMS ControllerEngine Service SQL ServerHDFS BridgeCompute NodeData Move ServiceSQL ServerHadoop ClusterData NodeTask TrackerData NodeTask TrackerData NodeTask TrackerPDW ClusterHDFS BridgeCompute NodeData Move ServiceSQL ServerPlanJob TrackerName NodeData NodeTask Tracker
  • 22. SQL-on-Hadoop : The TechnologyFaster HiveDistributed Query ProcessorsSplit Query Processors
  • 23. SQL-on-Hadoop or Map Reduce?
  • 24. </presentation>More onwww.systemswemake.comFollow : @systems_we_make