Modern Veri Ambarı_Cem Kubilay

404 views

Published on

MSHOWTO

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
404
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Modern Veri Ambarı_Cem Kubilay

  1. 1. Cem Kubilay Microsoft MEA
  2. 2. 2 Data sources
  3. 3. 3 Data sources Non-Relational Data
  4. 4.  Data sources Non-Relational Data
  5. 5. 6 Scale out technologies in Microsoft APS
  6. 6. Compute Node 1 PU PU PU PUPUPU PU PU Compute Node2 PU PU PU PUPUPU PU PU LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN LUN Sales Product 1. Hash Based 2. Replicated
  7. 7. Control Node Compute Node 1 SQL Server Instance Compute Node 10 SQL Server Instance . . . Dist. 1 Dist. 8 Dist. 73 Dist. 80. . . . . .Query: SELECT cust_id, SUM (units) FROM [sales] GROUP BY [cust_id] SELECT cust_id, SUM (units) FROM sales_1 GROUP BY [cust_id] DIRECT RESULTS SELECT cust_id, SUM (units) FROM sales_8 GROUP BY [cust_id] SELECT cust_id, SUM (units) FROM sales_73 GROUP BY [cust_id] SELECT cust_id, SUM (units) FROM sales_80 GROUP BY [cust_id] 125 rows 125 rows 125 rows 125 rows Fully Parallel Query Execution
  8. 8. 14 Scale out non-relational data in HDInsight (for Azure or APS)
  9. 9. 15 SQL Result set PolyBase Aggregate on HDFS table
  10. 10. 16 1. Export “COLD DATA” to Hadoop Hadoop SQL Server PDW
  11. 11. 17 2. Analyze unstructured files in Hadoop Hadoop SQL Server PDW
  12. 12. 18 3. Move semi-structured files to Hadoop & Analyze Hadoop SQL Server PDW
  13. 13. 19 4. Integrate with Hadoop in Azure SQL Server PDW Hadoop
  14. 14. 20 5. Combine data from different sources Hadoop SQL Server PDW Hadoop Query: Join between HDFS table and PDW table select c.*, o.* from pdwCustomer c, hdfsOrders o , Cloud_Twitter ct where c.c_custkey = o.o_custkey and ct.name=c.name and o_orderdate < ‘9/1/2010’ 20 Select c.*. o.* from Customer c, oTemp o,ctTemp ct where c.c_custkey = o.o_custkey and ct.name=c.name RETURN OPERATION4 Read hdfsTemp into oTemp, partitioned on o_custkey DMS SHUFFLE FROM HDFS on o_custkey 3 CREATE oTemp,ctTemp distrib. on o_custkey 2 On PDW compute nodes Execution plan : Run Map Job on Hadoop Apply filter to hdfsOrders, Cloud_Twitter, put data to Temp tables 1
  15. 15. & Microsoft Parallel Data Warehouse
  16. 16. Big Data Small Data All Data

×