Big Data mit Microsoft?

1,040 views

Published on

Wie HDInsight (Hadoop auf Windows Azure), SQL Server 2014 und Excel zusammenspielen

Big Data ist eines der großen Buzzwords der IT-Welt, und doch für viele noch Neuland. In diesem Vortrag diskutieren wir, was Big Data überhaupt bedeutet, und schildern die Rolle der Microsoft Technologien in einem Bereich, der weit mehr als Open Source und Hadoop ist. Hierbei zeigen wir anhand konkreter Szenarien, wie HDInsight, SQL Server 2014 und Excel zusammenspielen, um typische Big Data-Aufgabenstellungen zu lösen!

Published in: Technology

Big Data mit Microsoft?

  1. 1. Big Data mit Microsoft? Wie HDInsight, SQL Server 2014 und Excel zusammenspielen Olivia Klose, Technical Evangelist Georg Urban, Sr. Technology Solution Professional Microsoft Deutschland GmbH
  2. 2. The large hadron collider produces 15 PB/year* http://public.web.cern.ch/public/en/lhc/Computing-en.html
  3. 3. But what if I don‟t own a large hadron collider …
  4. 4.  Large scale plants  Vehicle fleets  Smart Grids  Green Energy  Stock Exchanges  Host Protocols  Computer Centers  Web Farms  Twitter  Facebook  Google Analytics  …
  5. 5. XML – but…  polystructured  varying  no explicit schema  lot„s of hex-BLOBs 40.000 attributes & growing „here is my data“ </meldungText><antwort>False</antwort><wert>na</wert></meldung><steuergeraet sgbdVariante="SMG_60"><steuergeraeteFunktion zeitstempel="2013-04-30T09:00:37.9926171-04:00 endDate="2013-04-30T09:00:38.1158609-04:00" jobName="STATUS_FAHRZEUGTESTER"><datensatz satzNr="1"><result name="JOB_STATUS">OKAY</result><result name="_TEL_ANTWORT">80 F1 18 70 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 82 6B 00 6D 6B 39 CD 14 00 14 00 00 0E 00 15 00 00 19 00 0C 00 12 00 15 85 57 71 88 81 C0 7D 73 C2 08 01 05 02 F7 00 FF FF 01 73 00 00 02 A8 00 C2 01 E0 00 00 00 00 00 00 3D 01 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 E1 02 05 01 F8 03 4F FF AD 04</result><result name="_TEL_AUFTRAG">83 18 F1 30 02 01</result><result name="STAT_KL15_ROH">0</result><result name="STAT_KLR_EIN_ROH">0</result><result name="STAT_WAKE_UP_ROH">1</result><result name="STAT_ISTGANG_TEXT">Neutral</result> <sgFunktion zeitstempel=“2013-04-30T10:33:37.0834084+02:00" endDate="2013-0430T10:33:37.9310504+02:00" jobName="_FLM_LESEN_BOSCH"><datensatz satzNr="1"><result name="FLM_DATEN_1">00 00 00 03 02 08 C6 56 46 4C 4D 39 00 16 4B B2 00 00 00 32 00 00 06 99 00 00 65 00 00 18 6E 00 00 00 73 00 00 00 20 00 00 00 73 00 00 00 00 00 00 10 69 00 00 0F 53 00 00 00 00 00 00 0A 00 00 79 6D 00 00 B7 34 00 00 D3 9E 4A 4C 41 52 00 00 00 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 2C 00 00 00 00 00 00 1A 5C 00 15 4B CA 00 00 44 08 00 00 2D 39 00 00 1E 45 00 00 2 00 00 1E EB 00 00 0C 65 00 00 04 47 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 27 00 00 01 1E 00 02 AB 00 00 07 71 00 00 13 D7 00 00 36 48 00 15 91 AD 00 00 3F 97 00 00 19 C1 00 00 07 F9 00 00 02 00 00 00 BD 00 00 00 20 00 16 1C 42 00 00 18 B1 00 00 09 40 00 00 08 9F 00 00 04 3A 00 00 01 3E 00 8C D7 00 00 61 A3 00 00 37 9D 00 00 1E 78 00 00 14 96 00 00 0A 71 00 00 05 49 00 00 02 B1 00 00 0 00 00 00 1D 00 00 00 09 00 00 00 05 00 00 00 00 00 00 00 00 00 00 23 BB 00 00 2F 84 00 00 14 EF 00 09 40 00 00 04 71 00 00 03 34 00 00 02 12 00 00 01 AC 00 00 01 59 00 00 0B C4 00 00 00 06 00 00 00 00 00 00 19 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 52 4F 54 48 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 56 30 00 00 00 03 00 11 00 01 01 06 00 00 00 00 00 00 00 00 01 00 00 00 0E 00 05 00 1A 00 12 00 00 00 26 00 00 00 00 00 0B 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 44 00 43 00 16 00 08 00 00 04 00 02 00 00 00 02 00 11 00 20 00 1A 00 0A 00 15 00 0F 00 1B 00 13 00 08 00 08 00 00 00 00 00 00 0E 00 08 00 04 00 02 00 01 00 00 00 6D 00 03 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0A 00 21 00 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 05 1F 00 00 00 00 00 00 00 00 00 1F 00 03 00 02 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2E 00 00 1B 00 19 00 18 00 0D 00 00 00 00 00 00 00 01 00 01 00 02 00 00 06 00 01 E6 00 12 00 03 00 02 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 02 01 BA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 00</result><result name="FLM_DATEN_2">08 00 0 00 00 00 00 00 00 00 0C 00 80 1B 00 45 10 00 A6 0D 00 51 16 00 59 44 00 00 EB 00 00 CA 00 00 49 00 17 00 10 00 0C 00 05 00 04 00 06 00 02 00 01 00 00 00 00 12 00 00 3A 00 00 26 00 00 13 00 0D 00 08 09 00 04 00 0A 00 00 00 00 00 00 00 00 03 00 00 0D 00 00 0B 00 00 07 00 02 00 05 00 01 00 01 00 04 00 00 00 00 00 00 04 00 07 00 08 00 06 00 04 00 …
  6. 6.     small data subsets are stored most data stays in file system (original XML-files) only about 3 years history is stored in the moment very much denormalized data (e.g. Entity-Attribute-Value tables)  TCO & performance limits (queries are slow - pivoting is expensive)  cover the whole live cycle 15 years (incl. production data)  more data sources: social media (motortalk)  lower TCO for storage & flexible analysis  …impossible with „classical“ RDBMS
  7. 7. "Big data" is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
  8. 8. ... Modular Hardware Architecture ...  ColumnStore v2 storage  Hadoop Regions Tight integration of “nonstructured” data FDR Infiniband Ultra high compression Direct attached SAS Scale Unit
  9. 9. Parallel Data Warehouse Screenshots
  10. 10. PDW in SQL Server Data Tools A familiar development enviroment
  11. 11. …just counting rows Scanning 10 billion rows… …does not take… …that long!
  12. 12. …a reporting query …won„t take… And even complex queries… …much longer!
  13. 13. Data Distribution Data is distributed evenly over all data nodes…
  14. 14. Azure UX Azure SDK HDInsight * Hive Templeton RDP * Pig HCatalog Ambari Map Reduce * Azure Blobs * = good to know! HDFS Sqoop Oozie
  15. 15. Analyze Demo-Umgebung Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  16. 16. Solution Components HDInsight Virtual Machine Twitter Excel
  17. 17. Big Data Twitter Demo Azure Management Portal
  18. 18. Analyse Manage Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  19. 19. Big Data Twitter Demo – Dashboard
  20. 20. Analyse Manage Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  21. 21. Big Data Twitter Demo – SQL Azure
  22. 22. Analyse Manage Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  23. 23. Big Data Twitter Demo Azure Blob Storage
  24. 24. Analyse Analyse Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  25. 25. Big Data Twitter Demo – Hive
  26. 26. Analyse Insight Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  27. 27. Big Data Twitter Demo Mash Up in Excel
  28. 28. Polybase Regular T-SQL Results  T-SQL query engine for RDBMS & Hadoop  Cost base optimizer. decides on:  Rendering operators in Map/Reduce-Jobs or  Moving HDFS data into RDBMS storage PDW  HDFS-Bridge for parallelized Data Transport HDFS Data Nodes &
  29. 29. T-SQL for Polybase A distributed query. Definition of an external table.
  30. 30. Modern Data Warehousing Parallel Data Warehouse HDInsight Polybase &
  31. 31. Big Data Enterprise Architecture &
  32. 32. What„s next… Twitter Big Data Sourcecode: http://twitterbigdata.codeplex.com/ Twitter Big Data Setup: http://aka.ms/bigdatatwitter Azure Trial: http://aka.ms/azurenow HDInsight: www.windowsazure.com/en-us/documentation/services/hdinsight/ Hortonworks for Windows: http://hortonworks.com/products/hdp-windows/ PDW und Polybase: http://microsoft.com/pdw Microsoft Big Data: http://microsoft.com/bigdata Deutsche SQL Server Konferenz 2014: http://www.sqlkonferenz.de
  33. 33. “Big data is like teen sex. Everybody is talking about it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” Dan Ariely, professor and director of Center for Advanced Hindsight at Duke University

×