Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Client Data
Fluency
Office
Skype
Bing
Modern Data
Capability
Instrumentation & Ingestion
Processing & Storage
Reporting & ...
Data Size	
Query
Latency
Get results inline
in Zeppelin
Need to open the
results in Excel
0 20 40 60 80 100 120 140 160 180 200
Cosmos
SparkSQL
SparkSQL with Cache
Write and Compile Query Submit and Wait in Job Q...
Mesos Cluster/HDFS	
Job Manager	
 Zookeeper	
Job Frontend Web API	
Spark Driver Host Pool	
Spark Hive Thrift Server	
Zeppe...
Partition	
  1
Partition	
  2
...
Partition	
  n
Export	
  Cosmos	
  
Partition
Partition	
  1
Partition	
  2
...
Partitio...
<Database2>
<Table1>
<Database1>
<Partition1>
<Table2>
<Partition2>
MetastoreDB
Hive	
  Thrift	
  Server
Hive	
  Loader
Ze...
Data
Ingest
Services
Clients
Transform
Compute
Transform
Compute
Data
Streams
Data
Sets
Store
Event
Processing
HDFSData
Tr...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)
Upcoming SlideShare
Loading in …5
×

Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)

4,632 views

Published on

Presentation at Spark Summit 2015

Published in: Data & Analytics

Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Microsoft)

  1. 1. Client Data Fluency Office Skype Bing Modern Data Capability Instrumentation & Ingestion Processing & Storage Reporting & Analytics Information Management Mobile-First Analytics Experience Experimentation
  2. 2. Data Size Query Latency
  3. 3. Get results inline in Zeppelin Need to open the results in Excel
  4. 4. 0 20 40 60 80 100 120 140 160 180 200 Cosmos SparkSQL SparkSQL with Cache Write and Compile Query Submit and Wait in Job Queue Job Run Time
  5. 5. Mesos Cluster/HDFS Job Manager Zookeeper Job Frontend Web API Spark Driver Host Pool Spark Hive Thrift Server Zeppelin Server Avocado (Hive Query + Schedule Task) Rover (Drag & Drop BI tool with Hive Code Gen) Zeppelin Web UI MetastoreDB Hive Loader Cosmos Storage
  6. 6. Partition  1 Partition  2 ... Partition  n Export  Cosmos   Partition Partition  1 Partition  2 ... Partition  n Task  2 HDFS.copyFrom LocalFile ... Task  n Partition  1 Partition  2 ... Partition  n saveAsParquetFile Task  2 ... Task  n
  7. 7. <Database2> <Table1> <Database1> <Partition1> <Table2> <Partition2> MetastoreDB Hive  Thrift  Server Hive  Loader Zeppelin  Server UserQuery Query
  8. 8. Data Ingest Services Clients Transform Compute Transform Compute Data Streams Data Sets Store Event Processing HDFSData Transportation Spark Streaming Receiver Analyst Zeppelin Notebooks Avocado Simple query Query language “Analyze” “Debug” “Mine” “Glance” Dat a Unified platform Intelligence Interactive analytics Data Products Better Digital Experience s Dual users “Bing” “Office”

×